DNP

Bringing Self-Hosted Large Language Models to the Douglas Mental Health University Institute

Gabriel Devenyi

Gabriel Devenyi, PhD

6 min read

Large language models have quickly become part of everyday clinical and research work, from drafting documentation to summarizing literature and exploring data. But the most capable commercial models are hosted by third parties, and using them means sending your text — including, potentially, sensitive patient information — to servers outside your institution’s control. For a mental health hospital, that is a problem worth solving rather than accepting.

The Douglas Neuroinformatics Platform is addressing it directly: we are deploying open-weight large language models on our own hardware, served locally to clinicians and researchers at the Douglas Mental Health University Institute. Nothing leaves the building.

The models

We selected a set of complementary open-weight models to cover the range of clinical and research needs:

Offering several models matters. No single model is best at everything, and being able to route a clinical summarization task to MedGemma while sending a multilingual or coding task to Qwen or Gemma gives users the right tool without compromise.

Serving with vLLM

All of these models are served through vLLM, an inference engine built for high-throughput, low-latency serving. vLLM’s continuous batching and paged attention let a single GPU server handle many concurrent users efficiently, and its OpenAI-compatible API means existing tools and scripts work against our local endpoint with minimal changes — point them at our internal URL instead of a vendor’s.

That compatibility is a deliberate choice. Researchers can keep using the libraries and editor integrations they already know, and clinicians get familiar chat interfaces, while every request is quietly handled inside the institute.

Why local matters: patient data safety

The central reason for this work is data safety. Mental health records are among the most sensitive data a hospital holds, and they carry both legal protection and a strong ethical obligation. Sending that data to an external API creates real risks:

Self-hosting removes these risks at the source. When the model runs on hardware the platform administers, on the institute’s own network:

The result is that a clinician can use an LLM to help draft a note or summarize a chart, and a researcher can analyze sensitive study data with model assistance, without either of them having to weigh productivity against the risk of exposing patient information. The safe option becomes the default option.

Helping groups remove PII from their data

Local serving solves the transmission problem, but many research and clinical groups also need to clean identifying information out of their own datasets before they share, analyze, or archive them. This is hard to do well by hand, and “just delete the names” misses dates, addresses, record numbers, and the many other ways a person can be identified in free text. The platform helps groups with this directly.

We support PII removal using two open token-classification models built for the task:

These models do token classification rather than generation, so they pinpoint where the sensitive spans are without rewriting the surrounding content. That lets a group apply the policy it needs — redact, pseudonymize, or flag — and review the results before anything is published or moved. The bilingual model handles our French/English mix; the base model gives a well-tested option for English-heavy datasets.

Like the language models, these run locally, so the data being cleaned never leaves the institute during the process. The platform works with each group to fit the tools to their data and workflow, turning de-identification from a manual chore into a supported, repeatable step.

What’s next

This deployment is the foundation for a broader set of tools — retrieval over internal documents, structured extraction from clinical text, and research-focused assistants — all built on infrastructure that keeps data where it belongs. As open-weight models continue to improve, a locally-served stack lets the Douglas adopt new capabilities on its own terms, without renegotiating data safety every time.

Local LLMs are not a compromise on capability. They are how a mental health institute gets the benefits of modern AI while keeping its first obligation — protecting the people in its care — intact.

Why a team like the DNP is essential

None of this happens by default. Selecting models, standing up GPU servers, configuring vLLM, securing the endpoints, integrating authentication, supporting bilingual PII removal, and keeping it all running is specialized work that sits well outside the training of clinicians and researchers — and outside what most of them have the time or mandate to take on. Left to individuals, the practical choice collapses to two bad options: paste sensitive data into whatever commercial tool is easiest, or go without modern AI entirely.

A dedicated neuroinformatics team removes that dilemma. The Douglas Neuroinformatics Platform exists precisely to carry this technical and operational burden on behalf of the institute:

This is the core argument for embedding a platform like the DNP inside a research hospital. The capabilities described here — capable local models, served safely, with privacy support for the people who use them — are not realistically available to a clinician or researcher acting alone. They become available when an institution invests in a team whose job is to build, secure, and sustain them. That team is what turns “modern AI, but only if you accept the privacy risk” into “modern AI, on infrastructure you can trust.”