To review a medical research article, check the question, design, methods, results, bias, and real-world fit, then rate how certain the findings are.
Readers open clinical papers to make a call: “Can I trust this, and does it help my patients or project?” This guide shows a clean, repeatable way to get that answer without jargon. You’ll move from the study question through design and methods, then read the results with effect sizes, not just p-values. You’ll finish with a short verdict you can share with a colleague or add to your notes.
How To Critique A Clinical Paper Step-By-Step
Think of a review as six fast lanes: question, design fit, method quality, results, bias checks, and applicability. The table below gives you a quick map; the sections that follow add depth and plain wording you can reuse.
| Step | What To Check | Quick Prompt |
|---|---|---|
| 1) Question | PICO/PICo clarity; outcomes that matter to patients | “Who, what, compared with what, and what outcome?” |
| 2) Design Fit | Trial, cohort, case-control, cross-sectional, diagnostic, review | “Does the design match the question?” |
| 3) Methods | Eligibility, setting, randomization, blinding, sample size, protocol | “Could this be repeated and yield the same estimate?” |
| 4) Results | Effect size, confidence interval, absolute vs relative change | “How big is the change and how precise is it?” |
| 5) Bias | Selection, measurement, confounding, missing data, selective reporting | “What can skew the estimate, and did they guard against it?” |
| 6) Applicability | Patient match, harms, feasibility, cost, equity | “Will this help the people in front of me?” |
Start With The Question
Strong papers state a tight question. A simple PICO prompt works: Patient/Problem, Intervention, Comparator, Outcome. Outcomes should be patient-centered (pain, function, survival) and time-bound. Surrogate outcomes (lab numbers, scores) can be handy, but they sit a tier lower when you need real-world change.
Scan the abstract and the last paragraph of the introduction for the one-line aim. If the aim is broad or vague, you’ll often see muddled outcomes and a loose conclusion later on.
Check That The Design Fits
Trials For Causation
When the goal is to test whether an action causes a change, randomized trials shine. You want clear allocation, concealment before assignment, and blinding where it’s doable. When reading a randomized study, look for a flow diagram and a methods section that follows a checklist. The CONSORT 2010 checklist gives a model for transparent reporting of trials, and many journals ask authors to follow it.
Observational Designs For Real-World Links
Cohort designs track exposure then outcomes; case-control starts with outcome then looks back; cross-sectional takes a snapshot. These are handy when trials are impractical or not ethical. You still need a tight definition of exposure, outcome, and confounders, plus a plan to deal with missing data. A clear checklist such as STROBE is often used by authors and editors for these designs.
Systematic Reviews For The Big Picture
When a paper pools many studies, check the protocol, search methods, inclusion rules, and bias checks for each included study. The PRISMA 2020 checklist maps the gold-standard items for a transparent review, including a flow diagram and an itemized method section.
Method Quality: The Nuts And Bolts
Population And Setting
Eligibility should be specific and credible. If a trial excludes older adults, those with comorbidities, or pregnant patients, note it now; it affects your end verdict on general use. The setting (tertiary center vs primary care, country, time frame) shapes baseline risk and event rates.
Randomization, Concealment, And Blinding
In trials, random sequence and concealment stop selection bias. Allocation known ahead of time invites cherry-picking. Blinding lowers measurement bias, especially for subjective outcomes like pain scores. If blinding is not feasible, look for objective outcomes or third-party assessors.
Sample Size And Power
A pre-study size calculation tells you whether the study had enough events to detect the planned effect. Underpowered work often slides to wide confidence intervals and spin. The methods should state the alpha level and the target effect measure.
Outcomes And Follow-Up
Primary outcomes should be declared ahead of time with timing and measurement tools. Changing the primary outcome after peeking at results raises bias flags. Loss to follow-up above about 10% can tilt estimates; you want reasons and a plan for imputation if used.
Read The Results The Right Way
Lead With Effect Size, Not Just P-Values
Ask for absolute change first. Relative risk can look large while the absolute shift is tiny. Use these fast translations:
- Absolute Risk Reduction (ARR) = control risk − treatment risk.
- Number Needed To Treat (NNT) = 1 / ARR (use proportion, not percent).
- Relative Risk (RR) = treatment risk / control risk.
- Odds Ratio (OR) and Hazard Ratio (HR) compare odds or hazard rates; check that the outcome is not rare before leaning on OR.
Confidence Intervals Tell You Precision
A 95% confidence interval wraps the estimate with a range. Narrow means precise. When the interval crosses the null value (RR = 1; ARR = 0), the data do not rule out no effect at the chosen alpha.
Subgroups And Multiplicity
Subgroup claims can mislead when they are not pre-specified or when there are many slices. Look for a clear plan and an interaction test. If dozens of outcomes appear, ask whether the authors adjusted for multiple looks.
Bias Checks You Should Always Run
Bias hides in process steps. A quick pass across core domains keeps your read honest:
- Selection: System for choosing participants can skew baseline risk. Random sequence and concealment help in trials; sound sampling frames help in cohorts.
- Measurement: Blinded outcome assessors and validated tools lower error. Self-reported outcomes need extra care.
- Confounding: In observational work, look for directed acyclic graphs or a prespecified list with a clear adjustment plan.
- Missing Data: Reported reasons and a method like multiple imputation beat silent deletion.
- Selective Reporting: Match outcomes to the protocol or registry when available. Outcome switching is a common problem.
Method groups such as Cochrane publish structured bias tools for trials and non-randomized studies. They break bias into domains with signaling questions and a traffic-light style summary, which mirrors the checks above.
Applicability: Will This Help Your Patients?
Now, translate the estimate into a clinic day. Ask whether the baseline risk in the paper looks like the people you see. Check whether the dose, setting, and follow-up match your options. Harms belong on the same scale as benefits; absolute rates help here too. If the paper lists resource needs or monitoring you can’t replicate, flag that in your notes.
Numbers You’ll Reuse Often
When a paper gives event counts, you can turn them into a clinic-friendly line in seconds. Say 12% vs 9% events across groups:
- ARR = 0.12 − 0.09 = 0.03 (3 percentage points).
- NNT ≈ 1 / 0.03 ≈ 34.
- RR = 0.12 / 0.09 = 1.33; now check the confidence interval for precision.
For continuous outcomes (blood pressure, scores), look for the mean difference or a standardized mean difference when scales vary. Again, pair the estimate with its interval.
Red Flags That Need Extra Caution
- Primary outcome not declared or changed late.
- Large relative effect with tiny absolute change.
- High loss to follow-up without a plan to handle it.
- Subgroups reported without a prespecified plan.
- P-values just under 0.05 with broad intervals.
- Composite outcomes mixing hard events with soft ones.
Effect Size Decoder (Printable Table)
| Term | What It Tells You | Handy Tip |
|---|---|---|
| ARR / ARI | Absolute drop or rise in risk | Use to get NNT/NNH fast |
| RR / HR | Proportional change in risk or hazard | Always pair with baseline risk |
| OR | Odds ratio; can overstate when events are common | Treat with care if event rate >10% |
| Mean Difference | Change on the original scale | Translate to a patient-relevant unit |
| Standardized Difference | Change in SD units across varied scales | Good for meta-analysis across tools |
| 95% CI | Precision range for the estimate | Narrow is better; check if it crosses null |
Conflicts, Transparency, And Checklists
Every review should note funding and relationships. Look for a standard disclosure form and a trial registration or protocol when present. Journals that follow uniform disclosure rules make this easy to find. When a trial or review aligns with a formal checklist, your read is smoother: you can see the question, methods, and results in a standard order with fewer gaps. Two anchors many teams use are the CONSORT 2010 checklist for trials and the PRISMA 2020 checklist for systematic reviews.
Build A Fair Verdict In Four Lines
Line 1: One-Sentence Takeaway
State the action and the outcome with an absolute number where possible. Example: “In adults with X, treatment Y lowered events by 3 percentage points over 12 months.”
Line 2: How Trustworthy?
Pick a plain label: high, moderate, low, or very low certainty. Use your bias checks and precision call to set that label.
Line 3: Who Benefits, Who Doesn’t
Name the subgroups that truly fit the data (pre-planned, interaction tested) and the ones not covered by the sample.
Line 4: Action Notes
Add safety items, monitoring, and any resource needs. If a paper points to a registry, protocol, or checklist, add those links in your internal notes so you can revisit them.
Worked Mini-Review Template You Can Copy
Question
Adults with condition A in primary care: does drug B vs placebo cut outcome C over 1 year?
Design Fit
Parallel-group randomized trial, two arms, concealed allocation, blinded outcome assessors.
Methods Snapshot
N = 650; inclusion: ages 40–80 with diagnosed A; exclusion: recent event D; primary outcome: event C at 12 months; loss to follow-up 6% total; prespecified subgroups by baseline risk.
Results Snapshot
Event C: 12% vs 9%; ARR 3%; NNT 34; RR 0.75; 95% CI 0.58–0.98; harms: mild E in 4% vs 2% (ARI 2%; NNH 50).
Bias And Precision
Random sequence and concealment stated; blinding of assessors; outcome registry matches publication; imputation plan used; interval does not cross null for the primary outcome.
Applicability
Primary care setting matches mine; monitoring needs are feasible; older adults >80 under-represented.
Verdict
Moderate certainty benefit on event C; watch for E; talk dose and monitoring during shared decision-making.
Tips That Save Time When You’re Busy
- Read title, abstract, and the last paragraph of the discussion first; then jump to methods.
- Underline the primary outcome and time point; skip side outcomes until you’ve checked the main one.
- Write ARR and NNT in the margin; it anchors any later debate.
- Mark any unplanned subgroup claims for a second pass only if they change care in a major way.
- Keep a one-page note template; paste your four-line verdict at the top.
Common Questions You Can Answer Fast
“Can I trust a big relative effect?”
Only when the absolute change matches the promise and the interval is tight. Big relative shifts with low base rates often mean tiny real-world gains.
“What if the p-value is just below 0.05?”
Look at the interval and the event counts. A tiny margin with a wide interval rarely moves practice on its own.
“What if outcomes are patient-reported?”
Check for validated tools, blinding of assessors, and minimal missing data. These lower measurement error.
Your One-Minute Exit Check
- Question matches design.
- Methods are clear enough to repeat.
- Effect size is stated in absolute terms with an interval.
- Bias domains checked and noted.
- Real-world fit written down with any caveats.
- Four-line verdict saved.
