Create a piloted form, train two extractors, record study and outcome data, double-check entries, resolve conflicts, and document every rule.
What Data Extraction Means And What You Need
Scope And Core Fields
Data extraction means capturing predefined fields in a standard form. Record design, participants, interventions, comparators, outcomes, timing, and effect data. Add risk-of-bias inputs and funding. Align the form with the protocol and planned synthesis.
Use the table below as a starter schema. Adapt it to your topic and keep a separate codebook that defines each field.
| Data Item | What To Record | Why It Matters |
|---|---|---|
| Citation / ID | Full citation, registry ID, linked reports | Trace sources and split duplicates |
| Study Design | Parallel RCT, crossover, cluster, cohort, case-control, etc. | Pick effect methods suited to design |
| Setting And Dates | Country, site type, recruitment and follow-up windows | Judge external relevance and era |
| Population | Eligibility, N, age, sex, baseline risks or means | Describe who was studied |
| Arms / Groups | Labels, N randomized and analyzed, attrition per arm | Enable effect calculations |
| Intervention | Name, dose, schedule, delivery, co-interventions | Know what was tested |
| Comparator | Placebo, active control, usual care, none | Anchor relative effects |
| Outcomes | Exact name, definition, instrument, direction | Keep measures consistent |
| Time Points | Nominal times and actual windows used in analyses | Align across studies |
| Effect Data (Dichotomous) | Events and totals; or effect and CI/SE | Compute RR, OR, or RD |
| Effect Data (Continuous) | Mean, SD, N; or change scores and SE/CI | Compute MD or SMD |
| Analysis Set | Intention-to-treat, modified ITT, per-protocol | Understand potential bias |
| Adjustments | Covariates used for adjusted models | Match like with like across studies |
| Unit Issues | Scales, units, direction, thresholds | Enable valid pooling |
| Multi-Arm Details | Arms included in the contrast, any splits | Avoid double-counting |
| Cluster Info | ICC, cluster size, analysis method | Correct unit-of-analysis |
| Risk-Of-Bias Inputs | Item-level judgments with quotes | Link judgments to data |
| Funding And COI | Source, role of funder, declared ties | Flag conflicts of interest |
| Notes | Anything needed to reproduce choices | Keep decisions visible |
Doing Data Extraction For Systematic Reviews: Prep And Workflow
Protocol And Checklist Alignment
Start with a protocol. State the data items, preferred effect measures, time points, subgroups, and planned conversions. Align with the PRISMA 2020 data items and reporting.
Form Design And Help Text
Build a form that matches your questions. Add validation and short help text. Include fields for quotes and page numbers to trace entries.
Piloting And Calibration
Pilot on three to five studies. Calibrate wording, add missing fields, and drop unused ones. Debrief, then freeze version 1.0 before full extraction. Add a second pilot if rules change.
Dual Extraction And Reconciliation
Use two people for each study. Each extractor works independently. A third person reconciles conflicts. Track agreement with a small sample to see if the form or the training needs adjustment. Start with a practice set.
Audit Trails And Versioning
Keep an audit trail. Record who entered each value, the date, the source page, and any changes with reasons. Store the form template, the codebook, and the reconciliation log beside the dataset.
Set Clear Rules Before You Start
Primary Choices
Write rules for common choices. Pick a primary time point for each outcome. Pick a preferred metric and a fallback. Decide what to do when outcomes appear in multiple places, such as text, tables, and appendices.
Linked Reports And Duplicates
Create a hierarchy for duplicates: most complete report, then registered results, then earliest peer-reviewed report. Link companion papers to one study ID so participants are not double-counted.
Multi-Arm And Crossover Rules
Define how you will treat multi-arm trials. Decide whether to combine similar arms or to split the shared comparator. Write a short rule for crossover trials, e.g., use period one if carryover is likely.
Units, Direction, And Thresholds
State unit rules. Make all scales point in the same direction. Convert units to a common scale. Note any thresholds used to define responders or events. Keep conversion formulas near the form.
Pick Time Points And Scales
Time Windows
Outcomes often appear at many times. Choose the time point that best matches the question, then set a window around it (e.g., 8–12 weeks for a short-term outcome). Record both the nominal and the actual timing.
Instruments And Direction
Name each instrument exactly. Capture the version, range, and the direction of benefit. If studies report both final values and change scores, decide which you will prefer.
Define Effect Measures And Calculations
Cochrane Handbook guidance covers choices and formulas that keep extraction consistent across studies.
Dichotomous Data
For dichotomous data, capture events and totals per arm. Compute risk ratio, odds ratio, or risk difference as needed. If a paper reports an adjusted odds ratio, keep raw counts plus the adjusted effect and its CI.
Continuous Data
For continuous data, record mean, SD, and N per arm. If only SE is given, convert using SD = SE × √n. If only a 95% CI is given, derive SD from the CI width and √n.
Standardized Effects
When scales differ, use a standardized mean difference. Record the exact instrument and interpret the sign consistently across studies. Capture whether the values are change scores or final values.
Conversion Notes
List the equations next to the form.
Data Extraction Process For A Systematic Review: From Forms To Files
Platforms And Exports
Pick a platform that fits your team and budget. A spreadsheet with locked validation can work. Specialist tools add audit trails and forms.
File Hygiene
Keep one master file. Use version numbers and read-only exports for analysis. Save raw downloads, PDFs, and correspondence in folders named by the study ID.
Calm Conflict Resolution
Reconcile conflicts in a calm, structured way. Compare entries side by side, read the source line, and agree on the rule that applies. Update the codebook when new edge cases turn up. Add notes.
Handle Special Study Designs Without Losing Accuracy
Cluster Trials
Cluster trials need an effective sample size adjustment if the analysis did not account for clustering. You will need an ICC and a mean cluster size. Record both and keep the source.
Multi-Arm Trials
Multi-arm trials can bias a meta-analysis if a shared comparator is counted twice. Combine similar arms or split the comparator so the participants are not double-counted.
Crossover Trials
Crossover trials raise carryover concerns. If washout is short or effects linger, prefer first-period data. Record the chosen approach in your log.
Nonrandomized Studies
Nonrandomized studies report adjusted effects using different models. Record the model, covariates, and whether the effect is marginal or conditional. Keep raw counts or means as well when they exist.
Deal With Missing Or Inconsistent Data
Conversions And Approximations
When SD is missing, convert from SE, CI, t, or p-values where possible. Record the formula and the page. If authors report medians with IQR, approximate SD with IQR ÷ 1.35 when a normal shape seems plausible.
Units And Scales
If arms report different scales for the same construct, convert units, or extract standardized effects. When results are only shown in graphs, read exact values using a digital ruler and mark them as estimated.
Contacting Authors
Contact authors with a short, specific request and a deadline. Store replies next to the study files. If no response arrives, flag the entry as estimated and proceed with the planned sensitivity checks.
Here are frequent snags and pragmatic fixes you can apply while keeping your dataset clean.
| Problem | What To Do | Tip |
|---|---|---|
| Outcome Reported Many Ways | Pick a priority order: adjusted effect with CI; then raw arm data; then unadjusted effect with CI | State the order in the codebook |
| Multiple Time Points | Pre-select a window per outcome; list fallbacks | Record the actual time used |
| Graph-Only Results | Use a plot digitizer; mark as estimated | Run a sensitivity analysis |
| Mismatched Scales | Convert units or use SMD | Document the rule and any constants |
| Missing SD | Convert from SE, CI, t, or p | Store the formula used |
| Shared Comparator | Combine arms or split the comparator | Prevent double-counting |
| Cluster Design Ignored | Adjust using ICC and cluster size | Note the source of ICC |
| Adjusted And Raw Both Given | Extract both; choose one for the main model | Match choices across studies |
Record Risk-Of-Bias And Context Alongside Outcomes
Item-Level Entries
Enter item-level judgments for each domain. Quote or paraphrase the line that supports each answer. Store the risk-of-bias file next to the effect data so the review can trace decisions.
Pick The Right Tool
Choose the tool that matches the design. Use RoB 2 for randomized trials and ROBINS-I for nonrandomized studies. Enter the signaling questions, the answers, and the domain judgments.
Quality Control That Saves Your Meta-Analysis
Second Pass Checks
Run a second pass on a random sample. Recompute a few effects from raw numbers and compare. Scan for outliers and impossible values with simple rules and charts.
Discrepancy Logs
Keep a discrepancy log. Write what changed, why, and who approved. When an error affects an analysis, regenerate the export and record the new file name in your log.
Tools That Speed Up Careful Work
Purpose-Built Tools
SRDR+ offers free, public extraction and sharing. Cochrane Review Manager handles effect calculations and exports. DistillerSR and Covidence offer managed workflows with forms, auditing, and project dashboards.
Spreadsheets Done Right
A well-built spreadsheet still works for small teams. Lock cells, restrict entries with lists, and add warnings for missing fields. Back up to a shared drive with permissions set per role.
File Naming, Storage, And Reproducible Sharing
Folder Structure
Create a folder per study ID with subfolders for PDFs, notes, data, and email. Name files with study ID, arm, outcome, and time point so sorting groups related items.
Shareable Materials
Publish your form, codebook, and a de-identified dataset when the review is done. Use a repository with a DOI so others can reuse the materials or check them against the paper.
Outcome-Specific Tips That Prevent Confusion
Time-To-Event Outcomes
For time-to-event outcomes, extract log hazard ratio and standard error when available. If only a Kaplan–Meier curve appears, digitize values and note the time horizon. State whether all randomized participants were included.
Harms And Adverse Events
For harms, copy the exact definition and the grading system. Note whether counts are per participant or per event. Record the observation window, any adjudication process, and whether serious events were reported separately.
Composite Outcomes
For composite outcomes, list each component and the rule for counting events. State whether a hierarchy applied and whether any component dominated the effect. Keep component-level counts when they are available.
Clinically Meaningful Thresholds
Capture thresholds that reflect clinical relevance. If an instrument has a minimal clinically meaningful difference, note it in the codebook. This helps readers interpret standardized effects later.
Reconciliation Playbook
Stepwise Resolution
Both extractors submit entries with page numbers. The reviewer checks the source line, applies the rule, and records the decision. If no rule exists, write one, test on two studies, and add it to the codebook.
Agreement And Training
Track agreement by field in week one. If a field lags, refine help text and run a short refresher. Agreement usually rises once examples are clear. Keep training notes.
Document Deviations From The Protocol
Write The Reason
Reports sometimes block your preferred choice. When that happens, write a brief note at the field level and the reason. Keep one deviations log so the manuscript can explain any switches without guesswork.
You Have Extracted The Right Data—Now What
Ready For Synthesis
Check that each outcome has at least two comparable studies. Decide whether a meta-analysis is sensible for each contrast. If synthesis is not sensible, prepare a structured narrative with clear tables.
Keep Files Live
Keep your extraction files open until the manuscript is accepted. Peer review often prompts small clarifications that require a quick look back at the source files.
