Monday, May 25, 2026

UDLCO CRH: Pharma driven vs academia driven global clinical trial integrity adjustment disorder

Summary of the Discussion after sharing the minutes of the journal club initially archived here: https://medicinedepartment.blogspot.com/2026/05/udlco-crh-sunday-journal-club-on.html?m=1





The transcript captures a debate between two clinicians (hu2 and hu7) regarding a recent journal club review of a metabolic dysfunction-associated steatohepatitis (MASH/NASH) commercial drug trial (specifically referencing Resmetirom data).


  • hu2 initiates the discussion by highlighting vulnerabilities in the trial design that could artificially inflate the drug’s apparent efficacy. They point to high placebo response rates, the breaking of double-blinding due to distinct drug side effects (diarrhea), and potential hidden manipulations within "grey literature" and protocols (such as post-hoc endpoint shifting and selective subgrouping) driven by massive commercial stakes.

  • hu7 counters these points by explaining the clinical and regulatory realities of modern trials. They argue that high placebo rates in MASH trials stem from the inherent subjectivity of liver biopsy histology (hepatocyte ballooning) rather than protocol manipulation. Furthermore, hu7 strongly defends the integrity of modern regulatory environments, noting that Statistical Analysis Plans (SAPs) must be locked before database unblinding, rendering intentional post-hoc inflation for primary approval impossible. They conclude that while bias exists, it is heavily regulated in pharma, whereas academic/NIH studies often suffer from worse post-hoc ("freezer study") biases.


## Keywords


  • MASH/NASH Resolution

  • Effect Size Inflation

  • Placebo Response

  • Double-Blinding Breakdown

  • Statistical Analysis Plan (SAP)

  • Post-Hoc Analysis

  • Grey Literature

  • Surrogate Endpoints

  • Regulatory Oversight (FDA/EMA)

  • Histologic Inter-observer Variability


## Thematic Analysis: Protocol Adjustments and Effect Size Inflation

The core of the debate centers on whether the trial's reported treatment effect was artificially amplified by structural protocol adjustments or design flaws. The arguments for both sides are categorized below:

1. Arguments FOR Protocol Adjustments & Factors Inflating Effect Size

This perspective, heavily championed by hu2, focuses on how commercial trial design can be subtly optimized or exploited to maximize apparent drug benefits.

  • Breakdown of True Double-Blinding: Resmetirom’s physical signature—specifically a significantly higher rate of early-onset diarrhea (~34% vs. 16% in placebo)—acts as an unblinding mechanism. If patients and site coordinators guess the assignment, it introduces behavioral and reporting biases that inflate the drug's perceived efficacy.

  • Asymmetric Site-Level Reforcements: While hu7 notes this is a site bias rather than a "protocol change," unblinded coordinators might more aggressively encourage diet and lifestyle compliance in the active arm, falsely magnifying the marginal drug effect.

  • Subjectivity in Central Pathology Reading: Relying on central readers for qualitative microscopic endpoints (like hepatocyte ballooning, which must drop to 0 for "NASH resolution") creates a single point of failure. Borderline slides may be subtly upgraded toward "resolution" to satisfy commercial objectives.

  • Theoretical Grey Literature Manipulations: In a broader sense, industry sponsors can inflate efficacy by shifting primary endpoints post-hoc, utilizing suboptimal or low-dose comparators rather than head-to-head active controls, or shortening washout periods to handicap competing drugs.

  • Surrogate Endpoint Reliance: Achieving "NASH Resolution" at 52 weeks is an abstract, tissue-level metric that allows for accelerated regulatory approval but may overstate the actual, long-term clinical utility before 54-month hard outcomes (cirrhosis/survival) are proven.

2. Arguments AGAINST Protocol Adjustments & Effect Size Inflation

This perspective, presented by hu7, argues that structural guardrails and the nature of the disease pathology account for these data anomalies, rather than protocol manipulation.

  • Histological Imprecision, Not Manipulation: The high and variable placebo arm response is an artifact of the inherent difficulty in assessing liver biopsies. Because "ballooning" scores are highly subjective, minor variations among the panel of central readers naturally cause shifts in data, which is a known limitation of MASH histology rather than a deceptive protocol tweak.

  • Strict Regulatory Saps (Statistical Analysis Plans): Modern regulatory frameworks (FDA/EMA) strictly forbid post-hoc endpoint shifting for drug registration. The primary endpoints and statistical handling methods must be finalized and locked before the trial database is unblinded.

  • Nominal Value of Post-Hoc Data: Any post-hoc adjustments or subgroup analyses cannot legally be used to gain a primary drug indication. They are heavily policed, must be explicitly labeled as "post-hoc," and their p-values are relegated to "nominal" status, rendering them purely hypothesis-generating.

  • Academic Multi-Author Accountability: Registrational trials feature numerous non-industry academic authors (often as first and last authors) who must personally attest to the validity of the data, minimizing the likelihood of corporate data fabrication.

  • The "Academic Bias" Counter-Argument: The risk of true unscientific post-hoc manipulation and "freezer study" bias is statistically much higher in under-regulated, academic, and NIH-sponsored research than it is in highly audited, multi-billion-dollar pharmaceutical registrar trials.


Conclusion of the Analysis: The dialogue highlights a critical friction point in Evidence-Based Medicine. While trial designs suffer from practical limitations that can inflate effect sizes (such as blinding breakdown via side effects and subjective surrogate endpoints), modern regulatory mandates largely prevent the outright, clandestine protocol manipulation of primary endpoints in phase 3 commercial trials.


Provide a summary, keywords and thematic analysis of the discussion content below around the web linked journal club focusing on the points for and against how protocol adjustments may or may not have led to effect size inflation.

Conversational transcripts:

[24/05, 16:28]hu2: Questions after the Sunday journal club today:

1) The fact that 31 placebo patients resolved their NASH and 46 improved their fibrosis score without receiving an active molecule proves that the natural course of MASH is highly variable. In commercial trials with high financial stakes, the "placebo arm" isn't a passive vacuum; it is highly dynamic?

2) True double-blinding assumes that neither the patient nor the doctor can guess their treatment allocation. However, Resmetirom caused significantly higher rates of gastrointestinal side effects (diarrhea in ~34% of the 100-mg group vs. 16% in the placebo group).

Critical Caveat: When a drug has a distinct physical signature (like early-onset diarrhea), the "blind" often breaks down in practice?


3) When funding originates entirely from a commercial entity with billions of dollars on the line, subtle, unshared protocols within the "grey literature" can be adjusted to artificially maximize the drug's apparent benefit?

4)The trial highlights a classic Evidence-Based Medicine (EBM) paradox?. 

"NASH Resolution" is a microscopic, pathological endpoint. It is an abstract, non-patient-related metric accepted by regulatory bodies for accelerated approval.

While observational data suggests these tissue-level changes eventually track with survival, the 52-week data does not guarantee that a patient feels better or will avoid cirrhosis. 

The true trial runs for 54 months to observe hard clinical outcomes. However, multinational and generic corporations (like Torrent) are already leveraging the initial 52-week surrogate hype to establish market dominance and secure physician loyalty years before

A picture worth the thousand words from this journal club can be seen here:


[24/05, 19:32]hu7: Some thoughts, but first, a disclaimer. I’m not a gastroenterologist. That said, my thoughts on your points:

1) the high rate of placebo response may be due to the imprecision in histology reads. MASH resolution (w/o worsening of fibrosis) has the highest placebo response because there is a huge amount of variability in the assessment of ballooning - of the three histologic features of MASH, steatosis, ballooning, and lobular inflammation, ballooning has to be 0 for resolution. Most MASH studies use a panel of 3 central readers, and use scores for each component that are unanimous (all 3 give the same score), majority (2 give the same score), median or consensus if all 3 scores vary. This reduces the placebo response, but still the inherent imprecision becomes apparent when you observe the huge differences in placebo response between studies. And yes, pre-cirrhotic MASH can regress. The idea of a placebo subtracted difference (treatment effect) helps define the drug effect over and above a chance effect. 

2) you are correct about true double-blinding, but it’s most likely to happen in studies with drugs that cause massive weight loss. GI symptoms can be reported by participants on placebo too - we did a study for BAYER eons ago called EDIP with acarbose, and almost all participants complained of abdominal symptoms. 

3) protocols for completed studies are required to be shared on the clinicaltrials.gov website, and as supplements to manuscripts published in the NEJM and Lancet and possibly other journals too. 

Question for you: can you provide an example of how a protocol can be adjusted to augment a drug’s benefit, artificially or otherwise? 

4) your point is at the crux of approvals for MASH therapies: showing histologic efficacy will only get “accelerated approval” or provisional approval from the FDA and EMA. Full approval requires clinical outcomes - a drug that has accelerated/provisional approval will be withdrawn from the market if the clinical outcomes study doesn’t show benefit.  There is sufficient data to show that fibrosis improvement (w/o worsening of steatohepatitis) tracks with clinical outcomes, mainly progression from pre-cirrhotic to cirrhotic MASH - this is not the case for MASH resolution. 
Finally, while histology is not a patient-related metric, there are data to support the idea that histologic changes correcte with changes in patient-related outcomes such as NASH CHECK. But I think if a study had to be powered to show differences in PROs, it would be impossibly large.


[24/05, 20:21]hu2: Excellent points 👏

Your point 3 question has been answered in the article with two examples, one pertaining to the standard of care life style interventions and the other that you have actually addressed in point 1. I quote,

"Tweakable Background Variables in Low-Integrity Environments


When funding originates entirely from a commercial entity with billions of dollars on the line, subtle, unshared protocols within the "grey literature" can be adjusted to artificially maximize the drug's apparent benefit:

Asymmetric Lifestyle Reinforcement: If trial site coordinators informally remind active-arm patients more aggressively about diet compliance (or if those unblinded by side effects self-select into healthier habits), the marginal drug effect is falsely magnified.


Central Pathologist Subjectivity: Relying on a single central reader for qualitative biopsy features (like hepatocyte "ballooning") introduces a single point of failure where borderline slides could be subtly pushed toward "resolution" to meet stringent corporate endpoints."

Unquote


[24/05, 20:27]hu7: Asymmetric lifestyle reinforcement by a coordinator is _*not_* a protocol adjustment. It’s a bias that’s introduced at the site level. It’s like the bias in reporting an AE, calling it drug-related or not, etc.



[24/05, 21:05]hu2: Also sharing a few more examples in general and unrelated to the current study 👇

When pharmaceutical sponsors hold massive financial stakes, unpublished, "grey literature" (like internal protocols and regulatory dossiers) can be manipulated to create an illusion of greater drug efficacy. This introduces significant bias into overall study outcomes. Common examples of these subtle, protocol adjustments include:


Shifting the Primary Endpoint Post-Hoc: Researchers may track dozens of health outcomes during a trial. If the drug fails to show significance on its original, pre-registered primary endpoint, sponsors might "adjust" internal protocols to promote a secondary, positive outcome—making the drug look like a success in marketing while burying the initial failure.


Selective Subgroup Analysis: A drug might perform poorly across the general trial population. To artificially boost its profile, internal protocols can be adjusted to hyper-focus on a small, specific subgroup (e.g., "patients over 60 without prior liver issues") where the drug happened to show a weak benefit.


Using Less-Than-Optimal Comparators: Instead of running "head-to-head" trials against the best existing medications, sponsors' protocols may mandate comparing the new drug to an inactive placebo or a suboptimal, low dose of an older drug, ensuring the new drug's results look significantly better.

Manipulating Washout Periods: In trials testing a drug against an old one, researchers can alter the protocol by shortening the "washout period" (the time allowed for patients to clear previous medications from their systems). This can skew baseline health conditions or increase adverse effects for the competitor drug, artificially making the new drug appear safer and more effective.

Redefining Normal Ranges: Protocols can adjust the definitions of what constitutes a "successful" response or a "mild" adverse event. Broadening the tolerance for what qualifies as a normal lab result can deliberately filter out negative side effects or inflate the percentage of patients categorized as "responsive" to the treatment.These behind-the-scenes manipulations often result in major Publication Bias, as unfavorable data is filtered out from public or peer-reviewed journals. 

For a comprehensive look at how these practices impact medical literature and prescribing guidelines, review the Qualitative Systematic Review on Financing of Drug Trials.Pharmaceutical industry sponsorship and research ... - PMC

Introduction. Clinical research sponsored by the pharmaceutical industry affects how doctors practise medicine. 1 An increasing number of clinical trials at all...

National Institutes of Health (.gov)

The Impact of Industry Funding on Randomized Controlled Trials of ...28 Feb 2022 — * 1. Introduction. Rituximab, an anti-cancer monoclonal antibody, was first approved by the Food and Drug Administration (FDA) for use in the United States in 1...

National Institutes of Health (.gov)(PDF) How pharmaceutical industry funding affects trial outcomes

Abstract and Figures. Three recent systematic reviews have shown that pharmaceutical industry funding of clinical trials is strongly associated with pro-industr...

ResearchGate

The Financing of Drug Trials by Pharmaceutical Companies and ...Execution of the study according to plan and objective depiction of the results can also be influenced, e.g., by contractual stipulations that grant the pharmac...



National Institutes of Health (.gov)Searching the “Grey Literature” for Clinical Trial Results - PMC - NIHFor clinical trials for which results have not been published, as well as for more complete information for clinical trials that have been published, the grey l...



National Institutes of Health (.gov)Big Pharma & Clinical Trials - Funding, Influence & CorruptionAs far back as 2000, Schulz manipulated data to make Seroquel seem better than other medications on the market. INFUSE Bone Graft and DePuy ASR Hip Implants. Me...Drugwatch.com


Clinical Trial Basics: Intervention Models in Clinical Trials | Power31 Mar 2023 —

 To reduce any carryover effect from the previous treatments, studies conducted under this model usually include a washout period so the previous treatment can b...withpower.com


[24/05, 22:36]hu7: What is the source of this information? I suspect there’s some hallucination. 

Statistical analysis plans for registrational studies have to finalized and approved well before the study is complete (last participant last visit) and database is locked - usually, the  choice of an endpoint (and the primary statistical analysis method) that is registrable is discussed with regulators and finalized before the start of phase 3. 

_*Post-hoc_* analyses are considered purely hypotheses generating, and typically not used for a registered endpoint with an indication, and when these post hoc analyses are published, they are required to be marked as post hoc, and the p-values have to identified as “nominal”.  

There are instances of an Agency requiring a different type of analysis for the endpoint - even then the endpoint doesn’t change, just the analysis method. I’ve seen this happen with handling of missing data. 

Post hoc analyses can’t be used for an indication. Post-hoc analyses used for promotion requires them to be disclosed as post- hoc. 

Today’s regulatory environment is very different from what was prevalent say 20-25 years ago.

And another point, all publications of primary and key secondary data from clinical trials conducted by pharma have numerous authors from academia (typically they are first and last authors). They do attest that the results are correct, and they agree with the presentation. 

OTOH

I can point to many “adjustments” to studies sponsored by the NIH. Many papers from academia, especially ones called “freezer studies”, are so incredibly post-hoc, but rarely called out as such; but no one says the conclusions should not apply to clinical care, or describe disease pathology, or define biomarkers. 

And if you look at retracted publications, publications with fabricated references, spurious publications, etc., an overwhelming majority of them are from academia. 

So, at the end, there’s enough blame to go around, making it very hard for practicing clinicians: they need to learn to look at the data very dispassionately, understand the difference between pre-specified and post-hoc analyses, understand statistical analysis methods, handling of missing data, multiplicity control, etc., before making conclusions that affect patient care.


[25/05, 06:47]hu2: Agree!

Plenty of learning points here. 🙂🙏

No comments:

Post a Comment