Review a medical article by checking design, bias, results, and real-world fit in that order.
Readers come to a paper to make one call: can I trust this claim enough to change care, policy, or teaching? The path below keeps the work tight. You’ll map the question, confirm the design, scan for bias, read the numbers that matter, and decide whether the finding travels to your setting.
Quick Triage: What Am I Looking At?
Start with the abstract and methods, not the conclusion. Write down three facts: the clinical question, the stated design, and the primary outcome. Then check if the design label matches the conduct. A trial without true allocation concealment behaves more like a quasi-experiment. A cohort with fuzzy entry time risks time-related errors. Label drift hides bias, so confirm early.
| Design | What To Confirm First | Common Red Flags |
|---|---|---|
| Randomized Trial | Random sequence, concealment, blinding plan | Imbalanced baseline, high cross-over |
| Cohort Study | Clear exposure timing and follow-up window | Immortal time, time-varying mix-ups |
| Case-Control | Case definition and control selection | Recall bias, inappropriate matching |
| Cross-Sectional | Sampling frame and response rate | Non-response skew, post-hoc subgroups |
| Diagnostic Accuracy | Reference standard and blinding | Spectrum bias, partial verification |
| Systematic Review | Protocol, search, duplicate screening | Mixing apples, weak bias checks |
How To Review A Medical Article: Practical Method
This method is built for speed and depth. Move in order. If a step fails hard, tag the paper “low trust” and move on.
Step 1: Frame The Clinical Question
Translate the title into PICO: patient or population, intervention or exposure, comparator, and outcome. If the outcome in the abstract doesn’t match what patients value, write the patient-centered outcome beside it. You’ll come back to that when judging applicability.
Step 2: Confirm The Design Fits The Question
Therapy or prevention usually calls for trials. Prognosis points to cohorts. Diagnosis needs accuracy studies with a clear reference standard. Evidence summaries lean on systematic reviews with a protocol. Match the claim to the design; mismatches create blind spots you can’t fix later.
Step 3: Check Registration, Ethics, And Reporting Rules
Trials should be prospectively registered with outcomes and a data-sharing plan. The ICMJE trial registration policy sets this bar. Reporting should follow a checklist that fits the design; the EQUATOR reporting guidelines hub links to CONSORT, STROBE, PRISMA, and many extensions. A study that follows these norms is easier to audit and less likely to hide selective reporting.
Step 4: Judge Risk Of Bias
Bias is a push that bends the estimate away from the truth. For trials, read for five areas: randomization, allocation concealment, deviations from the plan, missing data, and outcome measurement. The Cochrane layout for these checks is clear; see the Handbook chapter on risk of bias. For observational designs, anchor on confounding control, selection, and exposure/outcome measurement. If the study handles these well, your signal is already stronger.
Step 5: Read The Numbers, Then The Words
Lead with effect size and precision, not p-value fireworks. Pull the primary estimate with its 95% CI. For binary outcomes, compute absolute risk reduction (ARR) and number needed to treat (NNT) when possible. ARR = control risk − treatment risk. NNT = 1 ÷ ARR (using absolute proportions). For harms, compute number needed to harm (NNH). For time-to-event data, lean on hazard ratios with matching Kaplan–Meier curves and the count at risk. For diagnostic studies, grab sensitivity, specificity, and likelihood ratios; they travel better between settings than raw accuracy.
Step 6: Watch For Multiplicity
Multiple outcomes, many subgroup slices, or repeated interim looks inflate false positives. Check the registry or protocol for a prespecified plan. Subgroups should be declared in advance with an interaction test. If the paper spotlights a secondary outcome while the primary endpoint fizzled, you have a fragile claim.
Step 7: Handle Missing Data, Cross-Over, And Adherence
Loss to follow-up dilutes trust, even with large samples. Trials should present intention-to-treat as the anchor and show a per-protocol view when adherence wobbles. If the analysis uses multiple imputation, the paper should list the variables in the model and the number of imputations. For cohorts, check censoring rules and whether competing risks were handled sensibly.
Step 8: Judge Indirectness And Applicability
Ask a plain question: would I do this, with this dose and schedule, for patients like mine, in a clinic like mine? If the outcome is a surrogate, look for a link to patient-centered results. If the comparator is soft (placebo when head-to-head exists), the effect might shrink in real care.
Step 9: Grade The Certainty
When you synthesize across studies or brief a team, translate the body of evidence into a certainty label with GRADE. The approach downgrades for risk of bias, inconsistency, indirectness, imprecision, and publication bias, and can upgrade when non-randomized data show a strong, coherent signal. Public guides from ACIP and the GRADE group walk through the steps in plain terms.
Applied Walkthrough: From Abstract To Answer
Here’s a fast routine you can run on any clinical paper. Grab the PDF and a single page of notes. Set a timer for 12 minutes: two for triage, five for bias, five for results and applicability. You’ll finish with a one-paragraph verdict you can share with your team.
Triage In Two Minutes
Write the PICO. Write the stated design. Circle the primary outcome. Skim the flow diagram for counts: screened, randomized or enrolled, completed, and analyzed. A mismatch between enrolled and analyzed without a good reason hints at selective reporting or drop-out trouble.
Bias Scan In Five Minutes
Trials: random sequence method, allocation concealment, blinding plan, deviations from protocol, and missing data. Observational: cohort entry time, exposure measurement, outcome ascertainment, and confounder control with balance checks. Systematic reviews: a registered protocol, a full search in at least two databases, duplicate screening, and a plan to rate bias in included studies. If any of these are absent, drop your trust a notch.
Numbers In Five Minutes
Copy the primary estimate and CI. Compute ARR, NNT, or NNH when absolute risks are given. Check whether the CI crosses a threshold that would change care. Scan subgroup plots: symmetric, mild shifts suggest noise; wild swings with tiny groups are salesmanship. If a model uses many covariates with a small event count, overfitting may dominate.
Applicability In Three Minutes
Compare baseline tables to your panel of patients. Look at dose, delivery, and follow-up. Ask whether the comparator matches local care. If the paper uses a composite endpoint, read the components and see which one drives the effect. If the driver is a soft event, temper your call.
Write-Up In Two Minutes
Use one tight paragraph: question, design, a short bias tag, the main effect with CI, and an applicability note. End with a clear call: “use,” “use in select cases,” or “do not change care.” Add one line on what data would change that call.
Reading Tables, Figures, And Flow Diagrams
CONSORT and PRISMA diagrams save time. Start with counts, then baseline balance. If a small trial shows big baseline gaps and weak concealment, the estimate is fragile. Forest plots in meta-analyses should cluster; wide scatter hints at inconsistency. Funnel plots can hint at publication bias, but never hang your call on them alone.
Common Biases Mapped To Fixes
Use this table when a claim feels too strong or too shaky. It links the signal you spot to a fast check and a practical adjustment in your judgment.
| Signal | Fast Check | What To Do |
|---|---|---|
| Imbalanced Baseline | Table 1; key prognostic factors | Prefer adjusted or stratified effect |
| Selective Outcome Reporting | Compare paper with registry/protocol | Down-weight trust; search appendices |
| Loss To Follow-Up | Flow diagram; denominator shifts | Lean on ITT; check sensitivity runs |
| Measurement Error | Blinded assessors; validated tools | Favor objective endpoints |
| Time-Related Bias | Exposure timing and risk windows | Require proper time-to-event methods |
| Small-Study Effects | Wide CIs; erratic point estimates | Ask for replication or larger trials |
Effect Sizes You Can Trust
Pick measures that map to decisions. Absolute risk differences help at the bedside. Relative risk can sound large when baseline risk is tiny, so pair it with absolute numbers. When outcomes are common, odds ratios exaggerate; risk ratios or risk differences tell a cleaner story. For skewed data, medians with IQRs beat means. For continuous scales, check whether the minimal clinically important difference fits inside the CI.
Special Notes By Study Type
Trials
Look for central randomization, sealed opaque envelopes, or equivalent concealment. Co-interventions and cross-over dilute effects. If outcomes are subjective, blinding matters more. If the team swaps the primary endpoint midstream, read the registry history and anchor your review on the original endpoint.
Observational Studies
Confounding is the main threat. Good work shows a directed acyclic graph or at least a transparent plan for covariate selection. Propensity methods help when balance checks pass. Sensitivity runs that vary the model, the covariate set, and exposure definitions reduce the chance that a single modeling choice drove the claim. When treatment timing is flexible, look for designs that guard against immortal time.
Diagnostic Accuracy
The reference standard should be independent and applied to everyone or to a random sample. Case mix should span early, unclear, and classic presentations. If the sample is enriched with clear positives, sensitivity inflates and likelihood ratios fall apart in clinic. Decision-level tools like likelihood ratios and post-test probabilities travel better than raw accuracy.
Systematic Reviews And Meta-Analyses
Quality here rests on the protocol, the search, duplicate screening, and bias checks across included studies. If most included trials are high risk, the pooled answer inherits that risk. Heterogeneity should be explained with a small set of sensible, prespecified moderators. Many groups now grade certainty using GRADE; that label helps teams set policy and patient-level choices.
Stats Sanity Checks Worth Doing
Scan sample size and power just long enough to see if the trial was built to detect a patient-level effect, not only a lab change. Look for prespecified stopping rules in trials with interim looks. Check whether the model includes too many covariates for the event count. When the study reports only per-protocol results, ask for intention-to-treat. When only relative measures appear, compute absolute numbers before making a call.
Harms Reporting That Matters
Good papers define adverse events up front, report denominators, and show both common mild harms and rare serious ones. Tables that clump many symptoms into a composite can hide single events that patients care about. If efficacy looks modest and harms look under-measured, your threshold to adopt should rise.
Ethics, Funding, And Conflicts
IRB approval and consent should be clear. Funding sources and author ties shape behavior. Sponsor control over data, analysis, or drafts raises the bar for independent confirmation. A neutral funder does not guarantee clean conduct; it just removes one push.
From Paper To Practice
End with a plain decision: adopt, use in select cases, or do not change care. If you adopt, list what to track in practice—adherence, early harms, lab monitoring, or a stopping rule for safety. If you hold, write the missing piece that would change your mind, such as a larger, better-concealed trial or longer follow-up with patient-centered outcomes.
Printable One-Page Checklist
Use This Each Time You Read
1) Write PICO. 2) Confirm design fits the claim. 3) Registration and reporting match. 4) Bias domains checked. 5) Pull effect size and CI. 6) Compute ARR, NNT, or NNH when possible. 7) Scan multiplicity and subgroup claims. 8) Check missing data and adherence. 9) Judge real-world fit. 10) Label certainty and write a one-paragraph verdict.
