Medicine Department : UDLCO CRH: How not to fall into the over-testing and overtreatment trap while engaging in human and AI agentic participatory medical cognition

The following summary is based on the provided conversational learning transcripts concerning a complex clinical case (75F with T2DM, Hypertension, and Multifocal Stroke Syndrome).

IMRAD Summary

Introduction

The discussion focuses on the management of a 75-year-old female patient experiencing an escalation of neurological and respiratory symptoms, specifically cyanosis, respiratory distress, and trismus (jaw stiffness). The primary challenge addressed is the tension between comprehensive diagnostic inquiry and the risk of "over-testing/overtreatment" when integrating AI-driven suggestions with human clinical judgment.

Methods

A multidisciplinary team (hu1–hu4) utilized a Patient Journey Record (PaJR) to analyze the patient's history. The team engaged in a "participatory medical cognition" exercise, debating the validity of LLM-generated recommendations, the necessity of specific diagnostic tests (Thyroid, HbA1c, micronutrients), and the methodology for verifying evidence using PICO (Population, Intervention, Comparison, Outcome) frameworks.

Results

Clinical Synthesis: The patient’s symptoms suggest a progression from central (stroke) to peripheral (polyneuropathy) and autonomic nervous system failure, causing "neuropathic hyperexcitability" and episodic cramps.
Testing Rationale: While AI/hu1 suggested a wide battery of tests (TSH, B12, D, Iron, HbA1c), the team identified a risk of "citation rituals"—where AI provides evidence-like citations that may mask simple physiological or guideline-based truths.
Causality vs. Association: The team corrected a potential AI error that implied a direct causal link between TSH and neuropathy, clarifying that these are likely associated via shared metabolic dysregulation.

Discussion

The team concluded that to avoid the over-testing trap, clinicians must differentiate between mechanistic truths (e.g., levothyroxine absorption), guideline-based actions, and evidence-dependent interventions. The final responsibility rests with the human clinician to "step down the evidence ladder" (from RCTs to clinical judgment) based on the patient’s specific multimorbidity and context.

Keywords

Participatory Medical Cognition
Overtreatment Trap
Neuropathic Hyperexcitability
Evidence-Based Medicine (EBM)
Patient Journey Record (PaJR)
Multimorbidity
Clinical Judgment vs. AI

Thematic Analysis

1. The Hierarchy of Evidence vs. Clinical Reality

A central theme is the critique of "RCT-centrism." While Dr. hu2 argues for a mandatory "best RCT" citation to prevent over-testing, Dr. hu1 counters that many vital clinical actions (like how to take a pill) are based on pharmacology or experience, not trials.

Insight: Forcing AI to find an RCT for every claim can lead to "artificial authority" and a loss of nuance.

2. Avoiding the "Over-testing Trap"

The dialogue highlights how AI can rapidly generate long checklists of tests (Thyroid, Iron, B12, Vitamin D, etc.). The team identifies two safeguards:

Explicit Labeling: AI should state if a recommendation is mechanistic, guideline-based, or evidence-dependent.
The PICO Filter: Using the PICO format as a "starting heuristic" to see if an intervention has actually been tested before ordering it.

3. Human-AI Collaborative Cognition

The transcripts show that AI is excellent at "generalized knowledge" and pattern recognition (e.g., the link between TSH and HbA1c), but humans are required for contextual weighing.

The "Silent Operator": The clinician must translate an "association" (found by AI) into a "clinical priority" (determined by the patient's immediate stability).

4. Diagnostic Precision in Complex Cases

The theme of "seeing through the prism" suggests that in complex elderly patients, symptoms like trismus or cyanosis shouldn't just trigger more tests, but rather a re-evaluation of the whole system (e.g., checking for silent UTIs or electrolyte imbalances) rather than chasing every subclinical lab value.

Conclusion on the "Trap": To engage in agentic medical cognition without overtreating, the team suggests: "Preserve rigour without forcing every statement into a form it doesn’t naturally fit." Use AI to surface uncertainty, not just to provide a list of actions.

Please provide an imrad format summary, keywords and thematic analysis from the content below, focusing on How not to fall into the over-testing and overtreatment trap while engaging in human and AI agentic participatory medical cognition

Conversational learning Transcripts around the patient who's PaJR published case report is available full text here:https://publications.pajrhealth.org/parvat-tulsi-97901/

[02/05, 14:00]hu1: The most coherent "big picture" that emerges from this history is of a 75-year-old female with *long-standing Type 2 Diabetes and Hypertension* who has progressed to *Multifocal Stroke Syndrome*, resulting in *Neurogenic Bladder* and *Polyneuropathy*.

The most urgent concerns in the recent 2026 notes are the *episodes of cyanosis (turning blue)*, *respiratory distress*, and *trismus*, which require immediate investigation into her *airway stability and neurological status*.

[02/05, 14:02]hu1: The most coherent takeaway is that her neurological system is failing in stages—moving from the brain (strokes) to the peripheral nerves (polyneuropathy) and now potentially affecting the cranial nerves or autonomic control of breathing.

[02/05, 14:09]hu1: The immediate mission is to stop the cycle of "stroke-like" episodes by managing her blood pressure, sugar, and bladder function, while empirically supporting her bones with basic supplements to prevent her next annual injury.

Have anyone seen any signs of dental pain or swelling recently that might explain the jaw stiffness?

[02/05, 14:40]hu2: The overall picture appears to be that of episodic whole body muscle cramps that is currently dominating her symptoms landscape

[02/05, 14:48]hu1: See through this prism the picture seems to be consistent with and the symptoms of episodic whole body cramps are most likely explained by:

1. *Neuropathic Hyperexcitability*: Damaged nerves "short-circuiting" and causing systemic spasms.

2. *Autonomic Surges*: The cramps are exaggerated by sweating and accompanied by "impending doom" and respiratory distress (becoming pale/blue).

3. *Recent Escalation*: As of March 2026, she has experienced an inability to open her mouth (trismus), which requires urgent monitoring. Start by looking at her gums.

In my view the priority actions in Hospital are to:

• *Rule out Triggers*: Checking for silent UTIs (common with neurogenic bladders) or electrolyte imbalances (Calcium/Magnesium) that provoke cramps.

• *Ensure Airway Safety*: Monitoring the jaw stiffness and breathing episodes closely.

• *Provide Supportive Care*: Empirical Vitamin D/Bone support and managing her high sensitivity to BP medications.

[02/05, 14:57]hu2: Good idea

@⁨hu3⁩ @⁨hu4⁩ let's add a dental opinion to her already planned current list of OPD consultations such as Ophthalmology and ENT

Also let's get ger current urine CUE and urine culture sensitivity

[02/05, 15:19]hu1: Do her Thyroid Function Tests too

[02/05, 15:21]hu1: A high TSH level (indicating hypothyroid status) is positively correlated with HbA1c levels, meaning that uncontrolled blood sugar is strongly associated with thyroid dysfunction. Diabetic patients with hypothyroidism also show a significantly higher incidence of complications, such as diabetic neuropathy (89.3%) and nephropathy.

[02/05, 15:22]hu1: 📊 Thyroid & Diabetes in India – Key Facts

✅ Thyroid issues in diabetics: 20–32% (vs general population)

✅ Most common: Subclinical hypothyroidism – 16–33%

✅ Overt hypothyroidism: ~11–15%

✅ All thyroid problems combined: up to 46%

🚺 Women affected much more

· 44% females vs 20% males

· 73% of diabetics with thyroid issues are female

· Elderly women: ~28% have subclinical hypothyroidism

⚠️ Why it matters

High TSH → higher HbA1c (poorer sugar control)

Also linked to higher neuropathy (89%) & nephropathy

👉 Recommendation: Regular thyroid checks for all diabetics, especially women.

[02/05, 15:27]hu2: @⁨PaJR Health Bot Feb 26⁩ @⁨Patient Advocate 75F WB⁩ please share her last thyroid function test and last Hba1c report

[02/05, 15:27] PaJR Health Bot Feb 26: I cannot locate her last thyroid function test or HbA1c report in the records. The information you provided about thyroid issues in diabetics is very informative.

[02/05, 15:34]hu2: 👆@⁨hu4 @⁨hu4⁩ please get her thyroid function tests and Hba1c as well

[02/05, 15:35]hu3: Ok

[02/05, 15:36]hu4: Ok

[02/05, 15:27]hu1: *Check iron, vitamin B12, and vitamin D*

· Iron deficiency reduces levothyroxine absorption.

· B12 & D deficiency are common in both diabetes and hypothyroidism – worsen fatigue and neuropathy.

[02/05, 15:29]hu1: Stress and sleep management – chronic stress worsens both TSH and blood sugar.

[02/05, 15:29]hu1: As an aside: ✅ For diabetic women in India with thyroid issues:

1. Yearly TSH + antibody testing

2. Treat to TSH 0.5–2.5, not just normal

3. Recheck HbA1c 2–3 months after starting thyroid meds

4. Screen for neuropathy & nephropathy every 6–12 months

5. Check iron, B12, vitamin D

6. Take levothyroxine correctly – empty stomach, no food/coffee/calcium for 1 hour

7. Don’t stop thyroid meds even if sugars improve

📌 Most important: elderly diabetic women are highest risk – don’t miss them.

[02/05, 15:33]hu2: I guess it may be a good idea to ask our LLMs to share the single best RCT evidence for every statement about diagnostic and therapeutic intervention it makes?

[02/05, 15:48]hu1: May not be practical to do so and has its drawbacks and limitations

[02/05, 16:04]hu1: Dr. Hu2, I understand the intent behind your suggestion—and I share the instinct to make LLM outputs more accountable to evidence. However, I’ve come to feel that asking for a single “best RCT” for every statement may create an impression of rigour without actually improving clinical clarity.

When I reflect on the checklist I posted, not every point is the sort of claim that even has a single RCT behind it. For instance, advising patients to take levothyroxine on an empty stomach is grounded in pharmacology and absorption physiology rather than outcome trials. Likewise, not discontinuing thyroid medication reflects well-established disease behaviour and clinical experience, not something derived from one definitive study.

Even where recommendations are clearly evidence-based—such as screening for complications or reassessing HbA1c—the reality is that they arise from a body of evidence, not a single trial. That is precisely why frameworks like the GRADE Working Group were developed: to synthesise multiple studies, account for inconsistency, and judge applicability. Forcing an LLM to select one RCT risks being arbitrary and, at times, misleading.

There is also the matter of context. A recommendation such as targeting a tighter TSH range is not a universal RCT-derived rule—it depends on age, comorbidities, and clinical priorities. A single cited trial cannot capture that nuance, yet it can make the recommendation appear more definitive than it truly is.

My concern is that attaching one RCT to every statement may become more of a citation ritual than meaningful evidence communication. It risks giving an appearance of authority while obscuring uncertainty, variability, and limitations.

What I find more useful—and what I am trying to move towards—is encouraging LLMs to be explicit about:

* what type of statement they are making (mechanistic, guideline-based, or evidence-dependent),

* how strong the supporting evidence is, and

* where that evidence may or may not apply.

That approach preserves both honesty and usability. To me, it aligns more closely with real-world evidence-based practice—where decisions are informed by a synthesis of evidence, context, and clinical judgement, rather than a single “best” trial attached to each line.

[02/05, 16:07] GJ: Dr. Biswas, I would go a step further: even as we improve evidence transparency in LLM outputs, AI cannot—and will not—make clinical judgement obsolete.

Clinical decisions are not simply the retrieval of the “best” piece of evidence. They involve:

* weighing competing risks in a specific patient,

* interpreting incomplete or evolving information,

* adapting to multimorbidity and context, and

* aligning choices with patient preferences and real-world constraints.

No matter how well an LLM cites studies—even if it were to attach an RCT to every statement—it is still operating on generalised knowledge. It does not experience the patient in front of us, nor can it fully account for the dynamic, adaptive nature of living systems.

Frameworks like the GRADE Working Group themselves acknowledge that evidence is only one component of decision-making. Judgement—clinical, contextual, and often tacit—remains essential.

So to me, the goal is not to make AI replace judgement, but to make it support better judgement:

* by making evidence more transparent,

* by clarifying uncertainty rather than masking it, and

* by helping clinicians think more clearly, not less.

If anything, the more capable these systems become, the more important it is that we remain explicit about their limits. The final responsibility for interpretation, prioritisation, and action will continue to rest with the clinician.

[02/05, 16:20]hu2: To quote from your text, this is what I meant 👇

encouraging LLMs to be explicit about:

* what type of statement they are making (mechanistic, guideline-based, or evidence-dependent),

* how strong the supporting evidence is, and

* where that evidence may or may not apply.

Generally when we ask LLMs to cite the single best RCT in a PICO format, it's the starting point given the absence of which automatically one can slide down to the next best evidence right down to clinical judgement depending on pathophysiologic rationale in the absence of any other evidence

[02/05, 16:22]hu2: 👆For example here

1,2 and 5 need RCT evidence primarily to prevent the current pandemic of over-testing and overtreatment

[02/05, 16:26]hu2: Also here 👇

High TSH → higher HbA1c (poorer sugar control)

Also linked to higher neuropathy (89%) & nephropathy

Are the above causal links?

Logically they are all surely related to the underlying metabolic syn but the above statement makes them appear they have causal links between each other

[02/05, 16:30]hu1: @⁨hu2⁩ that helps clarify your position—and I think we’re actually closer in intent than it first appeared.

I agree that asking for a PICO-framed “best available RCT” as a starting point is a useful discipline. It forces the model (and the user) to ask: has this actually been tested in a controlled way? And if not, we consciously step down the evidence ladder—from trials to observational data to physiology and, ultimately, clinical judgement. That hierarchy is sound.

Where I remain cautious is in making that step mandatory for every statement, because some statements are simply not “RCT-shaped” questions. Mechanistic truths, pharmacokinetics, and long-standing clinical practices often sit outside that structure. For those, forcing an RCT anchor can become artificial rather than informative. But as a starting heuristic for intervention claims, your approach is entirely reasonable.

On your second point—the phrasing around TSH, HbA1c, neuropathy, and nephropathy—you’re absolutely right to call that out. As written, it does risk implying causality where only association exists.

A more accurate way to express it would be along the lines of:

* Higher TSH is associated with poorer glycaemic control (higher HbA1c)

* Thyroid dysfunction is associated with increased prevalence of microvascular complications

The underlying driver is likely shared metabolic and hormonal dysregulation (insulin resistance, inflammation, lipid abnormalities), rather than a direct causal chain:

high TSH → neuropathy

That distinction matters, especially when LLM outputs can unintentionally “upgrade” associations into causal statements.

On your final point about 1, 2, and 5 (screening frequency, tighter TSH targets, and micronutrient testing), I think this is where your concern about over-testing and overtreatment is particularly important.

* Some elements (like complication screening) are guideline-driven and reasonably well supported.

* Others (like tighter TSH targets or routine micronutrient panels) are more context-sensitive and not uniformly backed by strong RCT evidence.

So here, your principle holds well:

If we cannot identify solid trial evidence, we should explicitly acknowledge that we are operating in lower-certainty territory—and be more conservative.

Perhaps the synthesis between our views is this:

* Use PICO + “best available RCT” as the default entry point for intervention claims

* Explicitly step down the evidence ladder when RCTs are absent

* Clearly distinguish:

* causation vs association

* evidence vs rationale

* And always surface where uncertainty should temper action, especially to avoid overdiagnosis and overtreatment

That way, we preserve rigour without forcing every statement into a form it doesn’t naturally fit—and we reduce exactly the kind of unintended consequences you’re highlighting.

Medicine Department

Saturday, May 2, 2026

UDLCO CRH: How not to fall into the over-testing and overtreatment trap while engaging in human and AI agentic participatory medical cognition