AI Clinical Letter Generation in Sleep Clinics: What's Working, What's Not


The thing nobody tells you in medical training is how much of your week disappears into writing letters. Referral acknowledgements, GP updates after a sleep study, post-CPAP-titration summaries, specialist-to-specialist letters when a patient has overlapping conditions. By my rough count, the average sleep physician in an Australian outpatient clinic writes between 25 and 40 substantive clinical letters a week. That’s a working day, gone.

Over the last 18 months our practice has been progressively rolling out AI-assisted letter drafting. I want to write down honestly what’s worked, what hasn’t, and where the failure modes hide. This isn’t a vendor pitch — it’s a clinical observation.

What we’re actually doing

The setup is unglamorous. After a consult we record clinical notes the way we always have, structured around history, examination, investigations reviewed, impression and plan. A locally-hosted model takes the structured note, the patient’s prior letters, and the relevant sleep study or titration data, and produces a draft letter in our house style. The clinician reviews it, edits, signs, sends.

Three things mattered when setting this up:

  1. The model runs on infrastructure we control. No patient data leaves our environment. The privacy and security calculus changes if you’re sending identified clinical notes to a public API, and we wouldn’t do that.
  2. We trained the model’s style on roughly two years of our own anonymised letters. House style matters. Patients and referring GPs notice when a letter doesn’t sound like the doctor they know.
  3. Every draft is reviewed by the treating clinician before it’s sent. Always. Without exception.

Where it’s clearly winning

For the routine cases — uncomplicated OSA, straightforward CPAP titrations, predictable follow-ups — the drafts are now good enough that editing time is around 90 seconds per letter. We were spending 8 to 12 minutes per letter before. That difference, multiplied by 30 letters a week, gives back the better part of a day.

The other unexpected win is consistency. The model doesn’t have a bad afternoon. It doesn’t forget to mention the patient’s CPAP pressure history, or skip the section on sleep hygiene advice, or omit the warning about driving and untreated OSA. Letters from our practice are more uniform now than they were when they all came from different humans on different days.

Patient feedback has been quietly positive. They get clear letters sooner. Their GPs get letters within 48 hours of the consult instead of two to four weeks. That alone has improved coordination of care.

Where the failure modes hide

Now the part the brochures don’t cover.

Subtle hallucinations. Early in the rollout we caught the model attributing investigations the patient hadn’t had. It said one patient had completed an MSLT when they hadn’t. It said another had been on positional therapy when they hadn’t. These weren’t dramatic errors — the rest of the letter was internally consistent — but they were wrong. The lesson: a draft you skim because it looks reasonable is more dangerous than a blank page. Edit discipline is everything.

Tone drift on sensitive content. When a patient is distressed about their diagnosis, has comorbid depression, or is going through a difficult life event noted at consult, the draft letters can read flat or oddly upbeat. The clinician has to rewrite those sections every time. We’ve stopped trying to train this away and just flagged certain patients for full-write rather than draft-and-edit.

Overconfidence in interpretation. The model will write things like “the patient’s symptoms are most consistent with X” when the consult note actually expressed uncertainty. It smooths over genuine clinical ambiguity in a way that’s bad medicine. We’ve adjusted prompts to preserve hedging language, but it’s an ongoing battle.

Letter-to-letter contamination. When the same patient has had several recent letters, the model can pull in details from a prior consult that have since become out of date. A patient who has changed CPAP machines, lost weight, or had a comorbidity resolve still gets letters reflecting their prior state. Solving this required tighter scoping of what the model sees.

What it took to make this safe

The most important investment wasn’t the model. It was the workflow and audit framework around it.

Every letter is logged with the source consult notes and the draft. We sample 5% of all sent letters every fortnight and audit for accuracy, completeness, and tone. We track the edit rate — what proportion of the draft was changed before sending — as a quality metric. If a clinician’s edit rate is suspiciously low, that’s a flag for a coaching conversation about review discipline.

We worked through the Australian privacy and clinical governance implications carefully. The Therapeutic Goods Administration guidance on AI in medical devices is evolving, but the underlying principle — that the clinician remains responsible for the clinical communication — is clear. We treat draft letters the way we treated junior registrar drafts: useful starting points, never published unedited.

If you’re a small practice thinking about going down this path and you don’t have an in-house data person, you’ll need outside help. We worked with a consultancy that understood both the technical side and the clinical workflow — the team at Team400 handled some of the integration work — and that combination was essential. The technical implementation alone wouldn’t have got us to a clinically safe deployment. The clinical safety thinking alone wouldn’t have got us to a working system.

What we’d tell another clinic considering this

Three things, blunt:

If you can’t articulate what your edit-rate target is and how you’ll audit drafts, don’t start. You’re not ready.

If you’re using a public AI API and sending identified clinical notes through it, stop. There are local-hosted options now and they’re affordable. The Royal Australian College of General Practitioners has published useful guidance on AI in clinical practice that captures the basics.

If you’ve never measured how much time you actually spend on letters, do that first. We thought it was about half of what it turned out to be. The investment case is much stronger once you know the real number.

The technology will keep improving. The clinical and operational discipline you build around it is what makes it safe and what makes it sustainable. Get those right and the time you free up goes back to patients, where it should be.