Reviewing AI-Generated Notes: When Things Go Wrong | Module 3

You are reading the AI note. It says, “Patient denies chest pain.” But you never asked about chest pain. The patient never mentioned it. The AI has invented a negative.

This is the most important lesson in the entire course. Not because the technology is complicated — it is not — but because the errors AI scribes produce are uniquely dangerous. They are not the kind of mistakes that jump off the page. They are quiet. They sound right. They look like something a competent clinician would write. And that is exactly what makes them so hard to catch.

In the previous lesson, we covered how to review AI-generated notes systematically. Now we need to talk about what happens when those notes contain errors — and why the most dangerous errors are not the ones you might expect.

The obvious mistakes — a completely wrong diagnosis, a medication the patient has never taken, a consultation summary that bears no resemblance to what happened — those are easy. You spot them immediately. You correct them. No harm done. The errors that cause real problems are the ones that fit seamlessly into the narrative. They sound plausible. They look like something you might have said or done. And unless you are reading carefully, they slip through.

The hallucinated negative

This is the single most dangerous category of AI scribe error in general practice. The AI writes something like “Patient denies chest pain” or “No history of self-harm” or “No safeguarding concerns identified.” It reads like a thorough clinical note. The problem is that you never asked those questions, and the patient never offered that information. The AI has invented a negative finding.

Why is this so dangerous? Because a negative in the medical record is not just an absence of information. It is a positive assertion that something was explored and ruled out. When a future clinician reads “patient denies chest pain,” they will reasonably assume that chest pain was considered as a possibility, the patient was asked about it, and they said no. That future clinician may then deprioritise cardiac causes in their differential. The hallucinated negative has actively redirected clinical thinking.

A hallucinated negative is not a missing detail. It is a fabricated clinical finding. “Patient denies chest pain” when chest pain was never discussed creates a false record that could influence every future clinical decision about that patient.

Consider this scenario. A 58-year-old man presents with fatigue and mild breathlessness on exertion. You explore respiratory and haematological causes. The AI note includes “No chest pain. No palpitations. No ankle swelling.” You did not ask about any of those things — you were focused on the fatigue. But now the record suggests you performed a cardiovascular screen and found nothing. Six months later, the same patient presents to a colleague with new chest tightness. Your colleague reads your note and thinks: “Cardiovascular symptoms were explored last time and were negative. This is probably musculoskeletal.”

That is how a hallucinated negative becomes a near-miss. Not through dramatic failure, but through a quiet, plausible addition that nobody questioned at the time.

Invented findings and fabricated details

Beyond hallucinated negatives, AI scribes sometimes add clinical findings that were never observed. “Mild bilateral ankle oedema noted” when you did not examine the ankles. “Soft, non-tender abdomen” when the consultation was a telephone call. “Throat clear, no exudate” when the patient was seen for a skin complaint and you never looked in their throat.

These fabrications often follow a pattern. The AI has learned from thousands of clinical notes what a “typical” consultation for a given presenting complaint looks like. If a patient presents with shortness of breath, the AI expects the note to include findings from a respiratory and cardiovascular examination. If those findings were not mentioned in the conversation, the AI sometimes fills in the gaps with what it considers “normal” defaults.

Fabricated medication doses are another common problem. You might say during a consultation, “Let’s start you on amlodipine,” and the AI writes “Amlodipine 5mg once daily.” Perhaps 5mg is the standard starting dose and happens to be what you intended — but you did not say it aloud. Next time, the dose might be wrong. The AI is guessing based on probability, not transcribing what you actually said.

If a finding, dose, or detail was not explicitly discussed during the consultation, it should not appear in the record. Any AI-generated content that goes beyond what was actually said or done is fabrication, regardless of how clinically plausible it sounds.

Here is a particularly subtle example. A mother brings in her three-year-old with a cough. You take a history and examine the child. The AI note records the child’s weight as 14.2kg. You did not weigh the child. The AI has pulled a plausible weight for a three-year-old — but what if this child is on the 2nd centile? What if they are being monitored for faltering growth? A fabricated weight in the record could mask a genuine concern.

Near-misses that could happen to anyone

Let us walk through some realistic near-miss scenarios. These are not extreme edge cases. They are the kind of errors that could appear in any AI-generated note, in any GP surgery, on any given day.

Scenario 1: The altered referral indication. You refer a patient urgently under the two-week-wait pathway for a suspicious skin lesion. During the consultation, you say the lesion has changed in size and colour. The AI note records “Referral for skin lesion — patient reports recent change in size.” It has dropped the colour change. The referral letter generated from this note may now under-represent the clinical concern, potentially affecting triage at the receiving end.

Scenario 2: The missed allergy context. A patient mentions they had a “bad reaction” to co-amoxiclav years ago. You explore this — it was a rash, not anaphylaxis. The AI note records “Allergy: co-amoxiclav.” It has captured the allergy but lost the nature and severity of the reaction. The difference between “rash” and “anaphylaxis” matters enormously for future prescribing decisions.

Scenario 3: The safeguarding gap. During a routine child health check, the health visitor mentions in passing that the family has recently been referred to social services. You make a mental note to follow up. The AI note records the clinical findings from the health check but includes “No safeguarding concerns” because the conversation about social services was brief, indirect, and did not use explicit safeguarding language. The record now actively contradicts the reality of the situation.

Each of these scenarios shares a common feature: the error is not absurd. It does not stand out. A busy clinician signing off notes at the end of a long surgery could easily miss any of them. That is the problem.

What to do when you catch an error

Finding an error in an AI-generated note is not a failure. It is the system working as intended — because the system depends on you as the final check. What matters is what you do next.

First, correct the error immediately. Do not leave it for later. Do not assume you will remember. Open the record, make the correction, and save it. If your clinical system allows you to annotate the reason for the edit, do so. A note like “Corrected: AI scribe erroneously recorded ankle oedema — ankles were not examined” creates a useful audit trail.

Second, consider whether this is a pattern. One hallucinated negative might be a random error. Three hallucinated negatives in the same type of consultation suggest the AI model has a systematic tendency. If you are noticing the same category of error repeatedly, that information is valuable — for you, for your colleagues, and potentially for the vendor.

Third, report recurring errors. Your practice should have a process for reporting AI scribe issues, whether that is through your clinical system’s incident reporting, a shared log, or direct feedback to the vendor. If no process exists, raise it with your practice manager or clinical lead. These patterns only become visible when people report them.

Keep a mental checklist of the error types you have encountered personally. Over time, you will develop an instinct for where your particular AI scribe tool tends to fabricate, omit, or distort. That pattern recognition is a clinical skill worth developing.

Fourth, if the error could have caused patient harm — or did cause patient harm — treat it as a patient safety incident. File a significant event report through your practice’s usual process. This is not about blaming the AI or the clinician. It is about making the system safer. If the error is serious enough, consider reporting it through the Yellow Card scheme or directly to the MHRA, particularly if the AI tool is classified as a medical device.

Building your error-detection skills

Spotting AI errors is a skill, and like all clinical skills, it improves with practice and deliberate attention. Here are some practical strategies that work well in a busy GP surgery.

Read the note against your memory of the consultation, not as a standalone document. If you read it cold, your brain will process it as a plausible clinical narrative and accept it. If you read it while actively recalling what happened, discrepancies become much more obvious. This is why reviewing notes immediately after the consultation — while your memory is fresh — is so important.

Pay special attention to negatives. Every time the note says “no,” “denies,” “not present,” or “nil,” ask yourself: did I actually explore that? Was it discussed? If you are not certain, delete it. An absent finding is safer than a fabricated one.

Watch for specificity that you did not provide. If the note contains a specific dose, a specific measurement, a specific timeframe, or a specific examination finding that you do not remember stating or observing, treat it with suspicion. AI scribes often add specificity because specific notes “look” more thorough. But false specificity is worse than honest vagueness.

Finally, be especially vigilant during complex or emotionally charged consultations. These are the consultations where your cognitive load is highest, your memory of exact wording is least reliable, and the AI is most likely to misinterpret nuance. A consultation about possible cancer, a difficult mental health disclosure, a safeguarding concern — these are exactly the situations where errors matter most and are hardest to catch.

This is the single most important practical skill you will develop from this course. Not understanding how AI works. Not knowing the regulations. Those matter, but this — the ability to read an AI-generated note critically, catch what does not belong, and correct it before it becomes part of the permanent record — this is what protects your patients every single day.

Key Takeaway

The most dangerous AI errors are not obvious nonsense. They are plausible additions — negatives that sound reassuring, findings that seem reasonable, details that look right but are not. Your job is to spot them before they become part of the permanent record.