How Do You Review A Medical Article? | Clear, Fast Steps

Review a medical article by checking design, bias, results, and real-world fit in that order.

Readers come to a paper to make one call: can I trust this claim enough to change care, policy, or teaching? The path below keeps the work tight. You’ll map the question, confirm the design, scan for bias, read the numbers that matter, and decide whether the finding travels to your setting.

Quick Triage: What Am I Looking At?

Start with the abstract and methods, not the conclusion. Write down three facts: the clinical question, the stated design, and the primary outcome. Then check if the design label matches the conduct. A trial without true allocation concealment behaves more like a quasi-experiment. A cohort with fuzzy entry time risks time-related errors. Label drift hides bias, so confirm early.

Design-By-Design Triage Map
Design	What To Confirm First	Common Red Flags
Randomized Trial	Random sequence, concealment, blinding plan	Imbalanced baseline, high cross-over
Cohort Study	Clear exposure timing and follow-up window	Immortal time, time-varying mix-ups
Case-Control	Case definition and control selection	Recall bias, inappropriate matching
Cross-Sectional	Sampling frame and response rate	Non-response skew, post-hoc subgroups
Diagnostic Accuracy	Reference standard and blinding	Spectrum bias, partial verification
Systematic Review	Protocol, search, duplicate screening	Mixing apples, weak bias checks

How To Review A Medical Article: Practical Method

This method is built for speed and depth. Move in order. If a step fails hard, tag the paper “low trust” and move on.

Step 1: Frame The Clinical Question

Translate the title into PICO: patient or population, intervention or exposure, comparator, and outcome. If the outcome in the abstract doesn’t match what patients value, write the patient-centered outcome beside it. You’ll come back to that when judging applicability.

Step 2: Confirm The Design Fits The Question

Therapy or prevention usually calls for trials. Prognosis points to cohorts. Diagnosis needs accuracy studies with a clear reference standard. Evidence summaries lean on systematic reviews with a protocol. Match the claim to the design; mismatches create blind spots you can’t fix later.

Step 3: Check Registration, Ethics, And Reporting Rules

Trials should be prospectively registered with outcomes and a data-sharing plan. The ICMJE trial registration policy sets this bar. Reporting should follow a checklist that fits the design; the EQUATOR reporting guidelines hub links to CONSORT, STROBE, PRISMA, and many extensions. A study that follows these norms is easier to audit and less likely to hide selective reporting.

Step 4: Judge Risk Of Bias

Bias is a push that bends the estimate away from the truth. For trials, read for five areas: randomization, allocation concealment, deviations from the plan, missing data, and outcome measurement. The Cochrane layout for these checks is clear; see the Handbook chapter on risk of bias. For observational designs, anchor on confounding control, selection, and exposure/outcome measurement. If the study handles these well, your signal is already stronger.

Step 5: Read The Numbers, Then The Words

Lead with effect size and precision, not p-value fireworks. Pull the primary estimate with its 95% CI. For binary outcomes, compute absolute risk reduction (ARR) and number needed to treat (NNT) when possible. ARR = control risk − treatment risk. NNT = 1 ÷ ARR (using absolute proportions). For harms, compute number needed to harm (NNH). For time-to-event data, lean on hazard ratios with matching Kaplan–Meier curves and the count at risk. For diagnostic studies, grab sensitivity, specificity, and likelihood ratios; they travel better between settings than raw accuracy.

Step 6: Watch For Multiplicity

Multiple outcomes, many subgroup slices, or repeated interim looks inflate false positives. Check the registry or protocol for a prespecified plan. Subgroups should be declared in advance with an interaction test. If the paper spotlights a secondary outcome while the primary endpoint fizzled, you have a fragile claim.

Step 7: Handle Missing Data, Cross-Over, And Adherence

Loss to follow-up dilutes trust, even with large samples. Trials should present intention-to-treat as the anchor and show a per-protocol view when adherence wobbles. If the analysis uses multiple imputation, the paper should list the variables in the model and the number of imputations. For cohorts, check censoring rules and whether competing risks were handled sensibly.

Step 8: Judge Indirectness And Applicability

Ask a plain question: would I do this, with this dose and schedule, for patients like mine, in a clinic like mine? If the outcome is a surrogate, look for a link to patient-centered results. If the comparator is soft (placebo when head-to-head exists), the effect might shrink in real care.

Step 9: Grade The Certainty

When you synthesize across studies or brief a team, translate the body of evidence into a certainty label with GRADE. The approach downgrades for risk of bias, inconsistency, indirectness, imprecision, and publication bias, and can upgrade when non-randomized data show a strong, coherent signal. Public guides from ACIP and the GRADE group walk through the steps in plain terms.

Applied Walkthrough: From Abstract To Answer

Here’s a fast routine you can run on any clinical paper. Grab the PDF and a single page of notes. Set a timer for 12 minutes: two for triage, five for bias, five for results and applicability. You’ll finish with a one-paragraph verdict you can share with your team.

Triage In Two Minutes

Write the PICO. Write the stated design. Circle the primary outcome. Skim the flow diagram for counts: screened, randomized or enrolled, completed, and analyzed. A mismatch between enrolled and analyzed without a good reason hints at selective reporting or drop-out trouble.

Bias Scan In Five Minutes

Trials: random sequence method, allocation concealment, blinding plan, deviations from protocol, and missing data. Observational: cohort entry time, exposure measurement, outcome ascertainment, and confounder control with balance checks. Systematic reviews: a registered protocol, a full search in at least two databases, duplicate screening, and a plan to rate bias in included studies. If any of these are absent, drop your trust a notch.

Numbers In Five Minutes

Copy the primary estimate and CI. Compute ARR, NNT, or NNH when absolute risks are given. Check whether the CI crosses a threshold that would change care. Scan subgroup plots: symmetric, mild shifts suggest noise; wild swings with tiny groups are salesmanship. If a model uses many covariates with a small event count, overfitting may dominate.

Applicability In Three Minutes

Compare baseline tables to your panel of patients. Look at dose, delivery, and follow-up. Ask whether the comparator matches local care. If the paper uses a composite endpoint, read the components and see which one drives the effect. If the driver is a soft event, temper your call.

Write-Up In Two Minutes

Use one tight paragraph: question, design, a short bias tag, the main effect with CI, and an applicability note. End with a clear call: “use,” “use in select cases,” or “do not change care.” Add one line on what data would change that call.

Reading Tables, Figures, And Flow Diagrams

CONSORT and PRISMA diagrams save time. Start with counts, then baseline balance. If a small trial shows big baseline gaps and weak concealment, the estimate is fragile. Forest plots in meta-analyses should cluster; wide scatter hints at inconsistency. Funnel plots can hint at publication bias, but never hang your call on them alone.

Common Biases Mapped To Fixes

Use this table when a claim feels too strong or too shaky. It links the signal you spot to a fast check and a practical adjustment in your judgment.

Bias Signals And Fast Checks
Signal	Fast Check	What To Do
Imbalanced Baseline	Table 1; key prognostic factors	Prefer adjusted or stratified effect
Selective Outcome Reporting	Compare paper with registry/protocol	Down-weight trust; search appendices
Loss To Follow-Up	Flow diagram; denominator shifts	Lean on ITT; check sensitivity runs
Measurement Error	Blinded assessors; validated tools	Favor objective endpoints
Time-Related Bias	Exposure timing and risk windows	Require proper time-to-event methods
Small-Study Effects	Wide CIs; erratic point estimates	Ask for replication or larger trials

Effect Sizes You Can Trust

Pick measures that map to decisions. Absolute risk differences help at the bedside. Relative risk can sound large when baseline risk is tiny, so pair it with absolute numbers. When outcomes are common, odds ratios exaggerate; risk ratios or risk differences tell a cleaner story. For skewed data, medians with IQRs beat means. For continuous scales, check whether the minimal clinically important difference fits inside the CI.

Special Notes By Study Type

Trials

Look for central randomization, sealed opaque envelopes, or equivalent concealment. Co-interventions and cross-over dilute effects. If outcomes are subjective, blinding matters more. If the team swaps the primary endpoint midstream, read the registry history and anchor your review on the original endpoint.

Observational Studies

Confounding is the main threat. Good work shows a directed acyclic graph or at least a transparent plan for covariate selection. Propensity methods help when balance checks pass. Sensitivity runs that vary the model, the covariate set, and exposure definitions reduce the chance that a single modeling choice drove the claim. When treatment timing is flexible, look for designs that guard against immortal time.

Diagnostic Accuracy

The reference standard should be independent and applied to everyone or to a random sample. Case mix should span early, unclear, and classic presentations. If the sample is enriched with clear positives, sensitivity inflates and likelihood ratios fall apart in clinic. Decision-level tools like likelihood ratios and post-test probabilities travel better than raw accuracy.

Systematic Reviews And Meta-Analyses

Quality here rests on the protocol, the search, duplicate screening, and bias checks across included studies. If most included trials are high risk, the pooled answer inherits that risk. Heterogeneity should be explained with a small set of sensible, prespecified moderators. Many groups now grade certainty using GRADE; that label helps teams set policy and patient-level choices.

Stats Sanity Checks Worth Doing

Scan sample size and power just long enough to see if the trial was built to detect a patient-level effect, not only a lab change. Look for prespecified stopping rules in trials with interim looks. Check whether the model includes too many covariates for the event count. When the study reports only per-protocol results, ask for intention-to-treat. When only relative measures appear, compute absolute numbers before making a call.

Harms Reporting That Matters

Good papers define adverse events up front, report denominators, and show both common mild harms and rare serious ones. Tables that clump many symptoms into a composite can hide single events that patients care about. If efficacy looks modest and harms look under-measured, your threshold to adopt should rise.

Ethics, Funding, And Conflicts

IRB approval and consent should be clear. Funding sources and author ties shape behavior. Sponsor control over data, analysis, or drafts raises the bar for independent confirmation. A neutral funder does not guarantee clean conduct; it just removes one push.

From Paper To Practice

End with a plain decision: adopt, use in select cases, or do not change care. If you adopt, list what to track in practice—adherence, early harms, lab monitoring, or a stopping rule for safety. If you hold, write the missing piece that would change your mind, such as a larger, better-concealed trial or longer follow-up with patient-centered outcomes.

Printable One-Page Checklist

Use This Each Time You Read

1) Write PICO. 2) Confirm design fits the claim. 3) Registration and reporting match. 4) Bias domains checked. 5) Pull effect size and CI. 6) Compute ARR, NNT, or NNH when possible. 7) Scan multiplicity and subgroup claims. 8) Check missing data and adherence. 9) Judge real-world fit. 10) Label certainty and write a one-paragraph verdict.