How To Do Data Extraction For A Systematic Review | Step By Step

Create a piloted form, train two extractors, record study and outcome data, double-check entries, resolve conflicts, and document every rule.

Data extraction turns search results into analyzable evidence. Done well, it produces a tidy dataset that mirrors the studies without losing nuance. This guide gives you a clear method you can repeat across teams.

What Data Extraction Means And What You Need

Scope And Core Fields

Data extraction means capturing predefined fields in a standard form. Record design, participants, interventions, comparators, outcomes, timing, and effect data. Add risk-of-bias inputs and funding. Align the form with the protocol and planned synthesis.

Use the table below as a starter schema. Adapt it to your topic and keep a separate codebook that defines each field.

Data Item What To Record Why It Matters
Citation / ID Full citation, registry ID, linked reports Trace sources and split duplicates
Study Design Parallel RCT, crossover, cluster, cohort, case-control, etc. Pick effect methods suited to design
Setting And Dates Country, site type, recruitment and follow-up windows Judge external relevance and era
Population Eligibility, N, age, sex, baseline risks or means Describe who was studied
Arms / Groups Labels, N randomized and analyzed, attrition per arm Enable effect calculations
Intervention Name, dose, schedule, delivery, co-interventions Know what was tested
Comparator Placebo, active control, usual care, none Anchor relative effects
Outcomes Exact name, definition, instrument, direction Keep measures consistent
Time Points Nominal times and actual windows used in analyses Align across studies
Effect Data (Dichotomous) Events and totals; or effect and CI/SE Compute RR, OR, or RD
Effect Data (Continuous) Mean, SD, N; or change scores and SE/CI Compute MD or SMD
Analysis Set Intention-to-treat, modified ITT, per-protocol Understand potential bias
Adjustments Covariates used for adjusted models Match like with like across studies
Unit Issues Scales, units, direction, thresholds Enable valid pooling
Multi-Arm Details Arms included in the contrast, any splits Avoid double-counting
Cluster Info ICC, cluster size, analysis method Correct unit-of-analysis
Risk-Of-Bias Inputs Item-level judgments with quotes Link judgments to data
Funding And COI Source, role of funder, declared ties Flag conflicts of interest
Notes Anything needed to reproduce choices Keep decisions visible

Doing Data Extraction For Systematic Reviews: Prep And Workflow

Protocol And Checklist Alignment

Start with a protocol. State the data items, preferred effect measures, time points, subgroups, and planned conversions. Align with the PRISMA 2020 data items and reporting.

Form Design And Help Text

Build a form that matches your questions. Add validation and short help text. Include fields for quotes and page numbers to trace entries.

Piloting And Calibration

Pilot on three to five studies. Calibrate wording, add missing fields, and drop unused ones. Debrief, then freeze version 1.0 before full extraction. Add a second pilot if rules change.

Dual Extraction And Reconciliation

Use two people for each study. Each extractor works independently. A third person reconciles conflicts. Track agreement with a small sample to see if the form or the training needs adjustment. Start with a practice set.

Audit Trails And Versioning

Keep an audit trail. Record who entered each value, the date, the source page, and any changes with reasons. Store the form template, the codebook, and the reconciliation log beside the dataset.

Set Clear Rules Before You Start

Primary Choices

Write rules for common choices. Pick a primary time point for each outcome. Pick a preferred metric and a fallback. Decide what to do when outcomes appear in multiple places, such as text, tables, and appendices.

Linked Reports And Duplicates

Create a hierarchy for duplicates: most complete report, then registered results, then earliest peer-reviewed report. Link companion papers to one study ID so participants are not double-counted.

Multi-Arm And Crossover Rules

Define how you will treat multi-arm trials. Decide whether to combine similar arms or to split the shared comparator. Write a short rule for crossover trials, e.g., use period one if carryover is likely.

Units, Direction, And Thresholds

State unit rules. Make all scales point in the same direction. Convert units to a common scale. Note any thresholds used to define responders or events. Keep conversion formulas near the form.

Pick Time Points And Scales

Time Windows

Outcomes often appear at many times. Choose the time point that best matches the question, then set a window around it (e.g., 8–12 weeks for a short-term outcome). Record both the nominal and the actual timing.

Instruments And Direction

Name each instrument exactly. Capture the version, range, and the direction of benefit. If studies report both final values and change scores, decide which you will prefer.

Define Effect Measures And Calculations

Cochrane Handbook guidance covers choices and formulas that keep extraction consistent across studies.

Dichotomous Data

For dichotomous data, capture events and totals per arm. Compute risk ratio, odds ratio, or risk difference as needed. If a paper reports an adjusted odds ratio, keep raw counts plus the adjusted effect and its CI.

Continuous Data

For continuous data, record mean, SD, and N per arm. If only SE is given, convert using SD = SE × √n. If only a 95% CI is given, derive SD from the CI width and √n.

Standardized Effects

When scales differ, use a standardized mean difference. Record the exact instrument and interpret the sign consistently across studies. Capture whether the values are change scores or final values.

Conversion Notes

List the equations next to the form.

Data Extraction Process For A Systematic Review: From Forms To Files

Platforms And Exports

Pick a platform that fits your team and budget. A spreadsheet with locked validation can work. Specialist tools add audit trails and forms.

File Hygiene

Keep one master file. Use version numbers and read-only exports for analysis. Save raw downloads, PDFs, and correspondence in folders named by the study ID.

Calm Conflict Resolution

Reconcile conflicts in a calm, structured way. Compare entries side by side, read the source line, and agree on the rule that applies. Update the codebook when new edge cases turn up. Add notes.

Handle Special Study Designs Without Losing Accuracy

Cluster Trials

Cluster trials need an effective sample size adjustment if the analysis did not account for clustering. You will need an ICC and a mean cluster size. Record both and keep the source.

Multi-Arm Trials

Multi-arm trials can bias a meta-analysis if a shared comparator is counted twice. Combine similar arms or split the comparator so the participants are not double-counted.

Crossover Trials

Crossover trials raise carryover concerns. If washout is short or effects linger, prefer first-period data. Record the chosen approach in your log.

Nonrandomized Studies

Nonrandomized studies report adjusted effects using different models. Record the model, covariates, and whether the effect is marginal or conditional. Keep raw counts or means as well when they exist.

Deal With Missing Or Inconsistent Data

Conversions And Approximations

When SD is missing, convert from SE, CI, t, or p-values where possible. Record the formula and the page. If authors report medians with IQR, approximate SD with IQR ÷ 1.35 when a normal shape seems plausible.

Units And Scales

If arms report different scales for the same construct, convert units, or extract standardized effects. When results are only shown in graphs, read exact values using a digital ruler and mark them as estimated.

Contacting Authors

Contact authors with a short, specific request and a deadline. Store replies next to the study files. If no response arrives, flag the entry as estimated and proceed with the planned sensitivity checks.

Here are frequent snags and pragmatic fixes you can apply while keeping your dataset clean.

Problem What To Do Tip
Outcome Reported Many Ways Pick a priority order: adjusted effect with CI; then raw arm data; then unadjusted effect with CI State the order in the codebook
Multiple Time Points Pre-select a window per outcome; list fallbacks Record the actual time used
Graph-Only Results Use a plot digitizer; mark as estimated Run a sensitivity analysis
Mismatched Scales Convert units or use SMD Document the rule and any constants
Missing SD Convert from SE, CI, t, or p Store the formula used
Shared Comparator Combine arms or split the comparator Prevent double-counting
Cluster Design Ignored Adjust using ICC and cluster size Note the source of ICC
Adjusted And Raw Both Given Extract both; choose one for the main model Match choices across studies

Record Risk-Of-Bias And Context Alongside Outcomes

Item-Level Entries

Enter item-level judgments for each domain. Quote or paraphrase the line that supports each answer. Store the risk-of-bias file next to the effect data so the review can trace decisions.

Pick The Right Tool

Choose the tool that matches the design. Use RoB 2 for randomized trials and ROBINS-I for nonrandomized studies. Enter the signaling questions, the answers, and the domain judgments.

Quality Control That Saves Your Meta-Analysis

Second Pass Checks

Run a second pass on a random sample. Recompute a few effects from raw numbers and compare. Scan for outliers and impossible values with simple rules and charts.

Discrepancy Logs

Keep a discrepancy log. Write what changed, why, and who approved. When an error affects an analysis, regenerate the export and record the new file name in your log.

Tools That Speed Up Careful Work

Purpose-Built Tools

SRDR+ offers free, public extraction and sharing. Cochrane Review Manager handles effect calculations and exports. DistillerSR and Covidence offer managed workflows with forms, auditing, and project dashboards.

Spreadsheets Done Right

A well-built spreadsheet still works for small teams. Lock cells, restrict entries with lists, and add warnings for missing fields. Back up to a shared drive with permissions set per role.

File Naming, Storage, And Reproducible Sharing

Folder Structure

Create a folder per study ID with subfolders for PDFs, notes, data, and email. Name files with study ID, arm, outcome, and time point so sorting groups related items.

Shareable Materials

Publish your form, codebook, and a de-identified dataset when the review is done. Use a repository with a DOI so others can reuse the materials or check them against the paper.

Outcome-Specific Tips That Prevent Confusion

Time-To-Event Outcomes

For time-to-event outcomes, extract log hazard ratio and standard error when available. If only a Kaplan–Meier curve appears, digitize values and note the time horizon. State whether all randomized participants were included.

Harms And Adverse Events

For harms, copy the exact definition and the grading system. Note whether counts are per participant or per event. Record the observation window, any adjudication process, and whether serious events were reported separately.

Composite Outcomes

For composite outcomes, list each component and the rule for counting events. State whether a hierarchy applied and whether any component dominated the effect. Keep component-level counts when they are available.

Clinically Meaningful Thresholds

Capture thresholds that reflect clinical relevance. If an instrument has a minimal clinically meaningful difference, note it in the codebook. This helps readers interpret standardized effects later.

Reconciliation Playbook

Stepwise Resolution

Both extractors submit entries with page numbers. The reviewer checks the source line, applies the rule, and records the decision. If no rule exists, write one, test on two studies, and add it to the codebook.

Agreement And Training

Track agreement by field in week one. If a field lags, refine help text and run a short refresher. Agreement usually rises once examples are clear. Keep training notes.

Document Deviations From The Protocol

Write The Reason

Reports sometimes block your preferred choice. When that happens, write a brief note at the field level and the reason. Keep one deviations log so the manuscript can explain any switches without guesswork.

You Have Extracted The Right Data—Now What

Ready For Synthesis

Check that each outcome has at least two comparable studies. Decide whether a meta-analysis is sensible for each contrast. If synthesis is not sensible, prepare a structured narrative with clear tables.

Keep Files Live

Keep your extraction files open until the manuscript is accepted. Peer review often prompts small clarifications that require a quick look back at the source files.