The GP Core Revision Guide: Ultimate Edition
This guide is free · The complete AKT Pack has 2,000+ AKT-style questions, with new ones added weekly like these  See the AKT Pack →

The GP Core Revision Guide to Medical Statistics

The Ultimate Revision Guide

“From P-Values to Practice”

Welcome to the foundation. In the AKT, you don’t need to be a mathematician; you need to be a BS detector. This guide combines high-yield explanations with interactive simulations to help you “feel” the numbers.

Features

Interactive Simulators + AI Coach

How to Use This Guide

This guide is designed to make statistics intuitive, visual, and clinically relevant. Work through each chapter in order, using the interactive widgets to build real understanding rather than memorising formulas. The Common AKT Traps boxes highlight the exam’s favourite pitfalls, and the AKT‑style questions help you test your knowledge as you go.

When you reach the end, challenge yourself with the Final AKT Stats Mini‑Quiz and review the What to Memorise for the AKT summary. This guide is best used actively — adjust sliders, interpret graphs, and pause to predict answers before revealing them.


Chapter 1: P‑Values

1

The P-Value: The “Fluke” Factor

The P-value stands for Probability. It measures the likelihood that the result happened purely by chance.

Interactive: The Scales of Truth

Evidence Chance

The Pub Explanation

Imagine playing heads-or-tails. If your mate wins 10 times in a row, the probability of that happening by luck is tiny (P < 0.05).

Use the slider! Notice how when P is high (e.g., 0.40), the scales are balanced. Chance is heavy. When P is low (e.g., 0.01), “Evidence” slams down. The result is real.

The Courtroom Analogy: Type 1 vs Type 2 Errors

Type 1 Error (False Positive)

“Convicting an Innocent Man”

You say there IS a difference (Significance), but actually there is NONE.
Clinical example: telling a healthy patient they have cancer.

Type 2 Error (False Negative)

“Letting a Guilty Man Go Free”

You say there is NO difference, but actually there IS one (you missed it).
Usually caused by small sample size (Study wasn’t powerful enough).


Chapter 2: Confidence Intervals

2

Confidence Intervals & Significance

P-values are a simple “Yes/No”. CIs give you the magnitude. But crucial for the exam is spotting the Line of No Effect.

Interactive: The Bridge of Significance

Drag the slider to move the result. Watch the color change!

0 (No Effect)
SIGNIFICANT

It does NOT touch the line.

Difference (e.g. BP)

Line of No Effect = 0

“Zero for Hero”

Ratio (e.g. Relative Risk)

Line of No Effect = 1

“One is Done”

AKT‑Style Question

A clinical trial compares Drug A with placebo for reducing migraine frequency. The study reports P = 0.03.

Which statement best describes this result?

  • A. There is a 3% chance the drug works
  • B. There is a 97% chance the drug works
  • C. The result is unlikely to be due to chance
  • D. The drug is clinically effective

Correct answer: C

A P‑value of 0.03 means that if the drug had no real effect, the probability of seeing a result this extreme (or more extreme) by chance alone is 3%. It does not tell you the probability the drug works — only how surprising the result is under the assumption of no effect.

Common AKT Traps

  • Thinking P‑value = probability the hypothesis is true It doesn’t. It’s the probability of the data assuming no effect.
  • Assuming “statistically significant” = “clinically important” A tiny difference can be statistically significant in a large study.
  • Believing P < 0.05 proves the treatment works It only suggests the result is unlikely due to chance — not that the effect is meaningful.
  • Ignoring sample size Small studies often fail to reach significance even when a real effect exists (Type 2 error).
  • Misinterpreting P > 0.05 It does not mean “no effect” — it means “not enough evidence to detect one”.

You should now be able to:

  • Explain what a P‑value actually represents
  • Recognise common misinterpretations (e.g., “probability the hypothesis is true”)
  • Interpret P < 0.05 and P > 0.05 correctly
  • Understand why statistical significance ≠ clinical significance
  • Interpret a 95% CI for ratios and differences
  • Identify when a CI indicates statistical significance
  • Explain why narrower CIs = more precision
  • Spot when a CI crosses the line of no effect

Part 1: Foundations

Chapter 3: Study Design

“What type of study is this?”

The AKT loves asking “what type of study is this?” — get the design right and you’re halfway to the right answer about its strengths, weaknesses, and the kind of evidence it produces.

The Four Big Designs

Randomised Controlled Trial (RCT) — the gold standard

Patients are randomly allocated to intervention or control. Randomisation balances confounders, blinding reduces bias. Best for establishing causation. Cons: expensive, time-consuming, sometimes unethical, may exclude real-world patients.

Cohort Study — follow them forwards

Classify a group by exposure (e.g. smokers vs non-smokers) and follow over time to see who develops the outcome. Good for rare exposures; gives incidence and relative risk. Long follow-up needed; vulnerable to loss-to-follow-up.

Case-Control Study — look backwards

Take cases (with disease) and controls (without), then compare past exposures. Quick and cheap; ideal for rare diseases. Vulnerable to recall bias; gives odds ratios, not relative risk.

Cross-Sectional Study — a snapshot

Measure exposure and outcome at the same time. Quick; good for measuring prevalence. Cannot establish cause and effect (chicken-and-egg).

Hierarchy of Evidence

  1. Systematic review / meta-analysis of RCTs
  2. RCT
  3. Cohort study
  4. Case-control study
  5. Cross-sectional study
  6. Case series / case report
  7. Expert opinion

Intention-to-Treat (ITT) vs Per-Protocol

Intention-to-treat (ITT): analyse patients in the group they were originally allocated to, regardless of whether they actually took the treatment. Preserves randomisation. Conservative, real-world estimate. The preferred analysis for RCTs.

Per-protocol: analyse only those who completed the treatment as planned. Can exaggerate treatment effect because non-compliers are removed.

Memory hook: ITT = “what happens when you prescribe it” (pragmatic). Per-protocol = “what happens when you take it” (explanatory).

AKT‑Style Question

A new treatment for resistant hypertension is being investigated. The condition is rare, and follow-up over several years is impractical. Which study design is most appropriate?

Common AKT Traps

  • Confusing cohort and case-control. Cohort = exposure → outcome (forwards). Case-control = outcome → exposure (backwards).
  • Calling a cross-sectional study an “incidence study”. It measures prevalence, not incidence.
  • Assuming RCTs are always best. They aren’t always feasible or ethical.
  • Mixing up ITT and per-protocol. ITT is the conservative, randomisation-preserving choice.
  • Forgetting case-control studies give odds ratios, not relative risk.

You should now be able to:

  • Identify the four major study designs from a stem
  • Recall the hierarchy of evidence
  • Match study design to research question
  • Distinguish ITT from per-protocol analysis

Chapter 4: Bias & Validity

“What’s wrong with this study?”

The AKT exam is full of questions that essentially ask “what’s wrong with this study?”. Knowing the named biases and what they look like is high-yield revision.

The Main Types of Bias

Selection bias — participants aren’t representative of the population. Example: recruiting only hospital inpatients for a community-level study.

Recall bias — memory of past exposures is influenced by having the outcome. Example: mothers of children with congenital abnormalities recall exposures more thoroughly than mothers of healthy children. Classic in case-control studies.

Measurement (information / observer) bias — outcomes or exposures measured differently between groups. Reduced by blinding.

Attrition bias — drop-outs differ systematically from those who stay.

Publication bias — positive studies are more likely to be published than negative ones (see Forest Plots chapter).

Lead-time bias — a screening test detects disease earlier, making survival from diagnosis look longer even if the time of death is unchanged.

Length-time bias — screening preferentially picks up slow-growing, less aggressive disease. Makes screening look more effective than it is.

Validity & Reliability

Internal validity

Does the study correctly answer its own question? (Free from bias and confounding within the study.)

External validity

Do the results apply to other populations beyond the study? (Generalisability.)

Reliability

Does the measurement give the same result on repeat testing? (Consistency.)

Validity

Does the measurement actually measure what it claims to? (Accuracy.)

The dartboard analogy. Reliable = arrows all in the same spot. Valid = arrows all on the bullseye. You can be reliable without being valid (consistently wrong).

Reducing Bias

  • Randomisation — balances confounders between groups.
  • Allocation concealment — recruiters don’t know which group the next patient will be assigned to.
  • Blinding — single (patient), double (patient + investigator), triple (also analyst).
  • ITT analysis — preserves randomisation.
  • Standardised outcome assessment — same tools, same training, same blinding.

AKT‑Style Question

A case-control study investigates whether anti-emetic use in pregnancy is associated with cleft palate. Mothers of affected babies recall their exposures more thoroughly than mothers of unaffected babies. Which bias is most likely?

Pre-test → Post-test Probability: The Fagan Nomogram

The Fagan nomogram is a graphical tool that combines three things to give you a clinical answer:

  1. Pre-test probability — your clinical suspicion before the test (often estimated from prevalence or scoring systems like Wells’).
  2. Likelihood ratio — from the test.
  3. Post-test probability — the answer.

Draw a line from your pre-test probability, through the LR, and read off the post-test probability.

Why this matters: LRs are independent of prevalence, but post-test probability is not. The same positive test result means very different things in a 70-year-old smoker with chest pain (high pre-test probability) versus a 25-year-old with the same symptom (low pre-test probability).

Rule-of-thumb shifts (without the nomogram)

Likelihood ratioApproximate change in probability
LR+ 10+45 percentage points
LR+ 5+30 percentage points
LR+ 2+15 percentage points
LR– 0.5–15 percentage points
LR– 0.1–45 percentage points

Common AKT Traps

  • Mixing up validity and reliability. Valid = right. Reliable = consistent.
  • Confusing internal and external validity. Internal = within the study. External = generalisability.
  • Forgetting lead-time and length-time bias — they mostly apply to screening but do come up.
  • Assuming blinding fixes everything. It reduces measurement bias but doesn’t address selection bias.
  • Treating “randomised” and “blinded” as synonyms. Randomisation deals with confounders at the start; blinding deals with bias during follow-up.

You should now be able to:

  • Name and recognise the main types of bias
  • Distinguish validity from reliability
  • Distinguish internal from external validity
  • Explain how randomisation, blinding, and ITT reduce bias
Enjoying this so far? This guide is a sample of the AKT Pack — 2,000+ AKT-style questions, with new ones added weekly across every RCGP topic, written by a practising UK GP.
See the AKT Pack →

Chapter 5: Diagnostic Tests — The 2×2 Table & Prevalence

“Diagnostic Tests & The 2×2 Table”

1. The Fishing Net Analogy

We use tests to catch diseases. Think of the test as a Fishing Net.

Sensitivity (The Net)

🐟
🐟
🐟
👢
👢

High Sensitivity: Catches all the fish (True Positives).
(But also catches junk = False Positives). Rules OUT disease (SnOut).

Specificity (The Sorter)

👢 ➡️ 🗑️
👢
👢

High Specificity: Throws back the rubbish. Correctly identifies True Negatives. Rules IN disease (SpIn).

Part 2: Diagnostic Tests

The 2×2 Table: Coeliac Disease

You test 1,000 patients in a clinic where coeliac prevalence is 10%.

Disease Present (Sick) Disease Absent (Healthy) Totals
Test Positive (+) 80True Positive 10False Positive 90
Test Negative (-) 20False Negative 890True Negative 910
Totals 100 900 1000

Diagnostic Tests · Prevalence

The Prevalence Trap

Why good tests fail in Primary Care

Sensitivity and Specificity DO NOT change. They are fixed properties of the test.

However, PPV and NPV CHANGE depending on Prevalence. In GP (Low Prevalence), PPV drops massively. Most positives will be False Positives.

AKT‑Style Question

A new test for coeliac disease has a sensitivity of 90% and a specificity of 95%. In a GP population where the prevalence of coeliac disease is 1%, a patient tests positive.

Which statement is most accurate?

  • A. The patient almost certainly has coeliac disease
  • B. The test has ruled the disease in
  • C. Most positive results will still be false positives
  • D. The test is not sensitive enough for primary care

Correct answer: C

Even with excellent sensitivity and specificity, a low‑prevalence setting like GP massively reduces the positive predictive value (PPV). When prevalence is only 1%, most positive results will be false positives — the classic Prevalence Trap.

Common AKT Traps

  • Confusing sensitivity with PPV Sensitivity tells you how well the test detects disease — not how likely a positive result is to be true.
  • Ignoring prevalence PPV and NPV change dramatically with prevalence. Sensitivity and specificity do not.
  • Assuming a “good test” works everywhere A hospital‑grade test can perform poorly in GP because of low prevalence.
  • Thinking high sensitivity = rules in High sensitivity tests rule OUT disease (SnOut). High specificity tests rule IN disease (SpIn).
  • Forgetting false positives dominate in low prevalence settings Even a tiny false‑positive rate overwhelms true positives when disease is rare.

You should now be able to:

  • Calculate sensitivity, specificity, PPV and NPV
  • Explain how prevalence affects PPV/NPV
  • Apply SnOut and SpIn correctly
  • Interpret 2×2 tables confidently

Chapter 6: Likelihood Ratios & The Fagan Nomogram

“LR+ and LR–: How much does a test change your mind?”

Likelihood ratios tell you how much a test result changes your belief that a patient has a disease. They are the “shove” that moves you from pre‑test to post‑test probability.

The Golden Rules

  • LR+ > 10 → Strong rule‑in
  • LR– < 0.1 → Strong rule‑out
  • LR ≈ 1 → No diagnostic value

The Intuition

Think of LR+ and LR– as shoves: LR+ pushes you towards disease. LR– pushes you away from disease.

Clinical Example

D‑dimer: Sensitivity 95%, Specificity 50%
LR+ = 1.9 (weak rule‑in)
LR– = 0.1 (strong rule‑out)
→ Explains why D‑dimer is a rule‑out test.

The Formula (For Understanding Only)

You won’t need to calculate these in the AKT, but knowing the structure helps:

  • LR+ = Sensitivity / (1 – Specificity)
  • LR– = (1 – Sensitivity) / Specificity
X

Interactive: Likelihood Ratio Calculator

Adjust sensitivity and specificity to see how LR+ and LR– change. This shows why some tests are powerful rule‑ins, others rule‑outs, and some… are useless.

Results

LR+

LR–

LR+ Shove Meter

How hard a positive test result pushes you toward diagnosing disease.

Useless Weak Rule‑In Strong Rule‑In

AKT‑Style Question

A test for pulmonary embolism has a sensitivity of 95% and a specificity of 50%. What is the most appropriate interpretation of its likelihood ratios?

Which statement is most accurate?

  • A. LR+ is high, so the test is good for ruling in PE
  • B. LR– is high, so the test is poor for ruling out PE
  • C. LR+ is low, so a positive result does not increase the probability of PE much
  • D. LR– is high, so a negative result increases the probability of PE

Correct answer: C

With sensitivity 95% and specificity 50%, the LR+ is around 1.9 — too low to rule in disease. The LR– is around 0.1, which is strong for ruling out. This is why tests like the D‑dimer are excellent rule‑out tools but poor rule‑ins.

Common AKT Traps

  • Thinking LR+ depends on prevalence It doesn’t. LR+ and LR– are properties of the test, not the population.
  • Confusing LR+ with PPV LR+ tells you how much a positive result shifts probability — PPV tells you how likely the patient actually has the disease.
  • Assuming a high sensitivity test rules in disease High sensitivity → good for ruling out (SnOut). High specificity → good for ruling in (SpIn).
  • Forgetting the “magic numbers” LR+ > 10 = strong rule‑in LR– < 0.1 = strong rule‑out LR ≈ 1 = useless
  • Believing LR+ and LR– must be symmetrical A test can be excellent at ruling out but terrible at ruling in (e.g., D‑dimer).

You should now be able to:

  • Interpret LR+ and LR– values
  • Recognise strong rule‑in and rule‑out thresholds
  • Explain why LR ≈ 1 is useless
  • Understand how LRs update probability

Chapter 7: Screening Principles

“Wilson & Jungner — and the biases that flatter screening”

Screening is testing asymptomatic people for a disease. It’s a deceptively complex topic because the benefits are easy to overstate and the harms are easy to overlook. The AKT question bank loves screening questions, particularly around the Wilson & Jungner criteria and screening-specific biases.

Wilson & Jungner Criteria (1968)

The classic ten principles for whether a screening programme is worthwhile. Memorise the headlines across four buckets:

About the disease

  1. Important health problem
  2. Recognisable latent / early symptomatic stage
  3. Natural history adequately understood

About the test

  1. Suitable test (sensitive, specific)
  2. Acceptable to the population

About the treatment

  1. Accepted treatment exists
  2. Facilities for diagnosis & treatment available
  3. Agreed policy on whom to treat

About the programme

  1. Cost economically balanced
  2. Continuing process, not one-off

Memory hook: disease, test, treatment, programme — four buckets, two or three points each.

Screening-Specific Biases

Lead-time bias

Screening detects disease earlier. Survival from diagnosis looks longer, even though the time of death is unchanged.

Length-time bias

Screening preferentially picks up slow-growing, less aggressive disease (aggressive tumours appear and kill between screens).

Selection bias (“healthy screenee” effect)

People who attend screening tend to be healthier than those who don’t. Outcomes look better partly because of who turns up.

Overdiagnosis

Detecting disease that would never have caused harm in the patient’s lifetime. Leads to overtreatment of indolent conditions (PSA screening is the classic example).

AKT‑Style Question

A new screening test for an indolent cancer is shown to “improve five-year survival from 60% to 85%” when introduced. There is no change in overall mortality.

Which bias most likely explains this finding?

Common AKT Traps

  • Confusing lead-time and length-time bias. Lead-time = earlier diagnosis, same death. Length-time = preferentially catching slow-growers.
  • Forgetting that “improved survival” ≠ “lower mortality”. They’re different endpoints.
  • Mixing up screening and case-finding. Screening targets asymptomatic populations; case-finding targets risk factors.
  • Assuming all positive screening results need treatment. Overdiagnosis means some shouldn’t.

You should now be able to:

  • Recall the headline Wilson & Jungner criteria
  • Distinguish lead-time, length-time, and selection bias in screening
  • Explain overdiagnosis and why it matters
  • Recognise why “improved survival” alone is a weak endpoint for screening
Halfway there. This guide is a sample of the AKT Pack — 2,000+ AKT-style questions, with new ones added weekly across every RCGP topic, written by a practising UK GP.
See the AKT Pack →

Chapter 8: Hazard Ratios & Kaplan–Meier Curves

“Survival over time — visualised”

Kaplan–Meier curves show survival over time. They are used when the outcome is not just “did you die?” but “how long until you die?” This makes them perfect for cancer trials, cardiovascular studies, and any condition where timing matters.

What the Curve Shows

  • The y‑axis = proportion surviving
  • The x‑axis = time
  • Each downward “step” = an event (e.g., death)
  • Flatter curve = better survival
  • Steeper curve = worse survival

What is Censoring?

Censoring happens when we stop knowing what happened to a patient. This might be because they:

  • left the study
  • moved away
  • were still alive when the study ended

On the graph, censored patients are shown as small tick marks.

How to Compare Two Curves

  • The curve that stays higher = better survival
  • The curve that drops faster = worse survival
  • Look for consistent separation between curves
  • Crossing curves = unreliable comparison

How Hazard Ratios Fit In

The hazard ratio compares the rate of events over time between two groups. It is the numerical summary of what the Kaplan–Meier curves show visually.

  • HR = 1 → no difference
  • HR < 1 → treatment improves survival
  • HR > 1 → treatment worsens survival

Key Takeaway

Kaplan–Meier curves show how survival changes over time. Hazard ratios tell you how fast events are happening. Together, they give a complete picture of treatment effect.

Interactive: Kaplan–Meier Survival Curves

Adjust event rates to see how survival curves change over time.

Note: This is a simplified visual model for learning — not a statistical estimator.

AKT‑Style Question

A trial compares Drug A with placebo for improving survival in heart failure. The Kaplan–Meier curves show a clear and consistent separation, with the Drug A curve remaining higher throughout. The hazard ratio (HR) is 0.70 with a 95% CI of 0.55–0.90.

Which statement best describes this result?

  • A. Drug A reduces the risk of death by 30% at the end of the study
  • B. Drug A reduces the rate of death over time compared with placebo
  • C. Drug A improves survival only at the final time point
  • D. The result is not statistically significant

Correct answer: B

A hazard ratio of 0.70 means the rate of death over time is 30% lower in the Drug A group. Kaplan–Meier curves showing consistent separation support this. HR does not describe the risk “at the end” — it reflects the instantaneous risk over time.

Common AKT Traps

  • Thinking the hazard ratio is the same as relative risk HR reflects the rate of events over time, not the final proportion who died.
  • Ignoring curve separation Consistent separation = strong evidence. Crossing curves = unreliable comparison.
  • Misinterpreting censoring Tick marks are not deaths — they mean the patient left the study or follow‑up ended.
  • Assuming a higher curve means “fewer deaths overall” It means a higher proportion surviving at each time point.
  • Believing HR < 1 always means a big effect HR < 1 means benefit, but the CI tells you if it’s precise and significant.
  • Forgetting that KM curves show time‑to‑event They are not simple bar charts of “alive vs dead”.

You should now be able to:

  • Interpret HR < 1 and HR > 1
  • Explain that HR = event *rate*, not total events
  • Recognise when HR is unreliable (e.g., crossing curves)
  • Understand proportional hazards assumptions
  • Interpret survival curves correctly
  • Explain censoring and why it matters
  • Identify when curves cross and why that breaks HR assumptions
  • Understand what “higher curve = better survival” means

Chapter 9: Regression & Odds Ratios

“Understanding Odds Ratios & Adjusted Models”

Regression models help us understand how different factors affect an outcome. In the AKT, you don’t need to calculate anything — you just need to interpret the numbers.

1. Odds Ratios (OR)

An odds ratio compares the odds of an outcome between two groups.

  • OR = 1 → no difference
  • OR > 1 → higher odds of the outcome
  • OR < 1 → lower odds of the outcome

Example: OR = 2.0 → “Twice the odds of the outcome.” Example: OR = 0.5 → “Half the odds of the outcome.”

2. Adjusted vs Unadjusted Models

An unadjusted model looks at one factor at a time. An adjusted model accounts for other variables that might influence the result.

Example: Adjusting for age, sex, smoking, BMI.

If the OR changes a lot after adjustment → confounding was present.

3. What is Confounding?

A confounder is a factor that:

  • affects the exposure
  • affects the outcome
  • distorts the true relationship

Example: Coffee drinking appears linked to lung cancer — until you adjust for smoking.

4. Multivariate Regression

This is simply a model that includes multiple predictors at once. It answers: “Which factors independently predict the outcome?”

Example: In heart disease, independent predictors might include:

  • age
  • smoking
  • blood pressure
  • cholesterol

5. How to Read a Regression Table

A typical table includes:

  • Variable (e.g., age, smoking)
  • Odds Ratio (effect size)
  • 95% CI (precision)
  • P‑value (significance)

✅ If the CI does not cross 1 → significant ✅ If the CI crosses 1 → not significant ✅ OR > 1 → increases odds ✅ OR < 1 → decreases odds

Key Takeaway

Regression tells you which factors truly matter. Odds ratios tell you the direction and strength of the effect. Confidence intervals tell you the certainty of the estimate.

Interactive: Regression Table Explorer

Toggle variables to see how an adjusted model might change the “story”.

Variable OR 95% CI P‑value In Model?

Model interpretation

Adjusted for age, smoking, BMI and exercise, smoking remains a strong independent predictor of the outcome.

AKT‑Style Question

A study looks at the association between smoking and chronic cough. The unadjusted odds ratio (OR) for smoking is 2.8. After adjusting for age, the OR falls to 1.6.

What is the most likely explanation?

  • A. Age is a confounder of the relationship between smoking and chronic cough
  • B. Smoking is not associated with chronic cough
  • C. The adjusted model is incorrect
  • D. The sample size is too small

Correct answer: A

The OR drops from 2.8 to 1.6 after adjusting for age, meaning part of the apparent effect of smoking was actually due to age. This is classic confounding: age is linked to both smoking and chronic cough, and adjusting for it reveals the true, smaller effect of smoking.

Common AKT Traps

  • Confusing correlation with causation Regression shows associations, not proof of cause.
  • Ignoring confounders If the OR changes a lot after adjustment, a confounder was influencing the result.
  • Misreading the confidence interval If the CI crosses 1, the result is not statistically significant — even if the OR looks impressive.
  • Over‑interpreting small p‑values A tiny p‑value doesn’t mean a big effect — it just means the result is unlikely due to chance.
  • Forgetting what “adjusted” means Adjusted models control for other variables. Unadjusted models do not — and are more vulnerable to confounding.
  • Assuming all predictors are equally important Some variables are strong independent predictors; others barely change the model.

You should now be able to:

  • Interpret odds ratios from logistic regression
  • Explain why odds ≠ risk
  • Recognise confounding and adjustment
  • Understand what a regression coefficient represents
Almost done. This guide is a sample of the AKT Pack — 2,000+ AKT-style questions, with new ones added weekly across every RCGP topic, written by a practising UK GP.
See the AKT Pack →

Chapter 10: Risk & Reward (ARR, RRR, NNT, NNH)

Risk & Reward (ARR, RRR, NNT)

“NNT, Hazard Ratios & Marketing”

The Risk & Reward chapter is all about understanding what treatment effects really mean in practice. Drug companies love to quote dramatic-sounding relative risk reductions (RRR), but clinicians need the more honest measures: absolute risk reduction (ARR) and number needed to treat (NNT). These tell you how much benefit a patient actually receives, not just how impressive the statistics look. Hazard ratios add a time dimension, showing how quickly events occur in each group. Together, these measures help you cut through marketing spin and interpret treatment effects with real clinical clarity.

Part 3: Treatment & Analysis

1. The Marketing Trick (RRR vs ARR)

The Salesman (RRR)

50% Reduction!

“Cuts heart attack risk in half!”

Relative Risk Reduction (RRR)

The Scientist (ARR)

1% Reduction.

“Risk went from 2% to 1%.”

Absolute Risk Reduction (ARR)

Number Needed to Treat (NNT)

2. Number Needed to Treat (NNT)

NNT answers: “How many people do I have to treat to save just ONE person?”

Simulation: NNT = 100

1 Saved. 99 Took pills for nothing.

Number Needed to Harm (NNH)

The mirror image of NNT.

NNH = 100 / ARI (Absolute Risk Increase of an adverse event, expressed as a percentage).

NNH answers: “How many people do I have to treat for one to be harmed?”

A treatment is only worthwhile if NNT < NNH for the harm you’re worried about. This is the heart of shared decision-making.

Worked Example

A new anticoagulant prevents stroke in 4% of patients over 5 years (ARR = 4%) but causes major bleeding in 1% (ARI = 1%).

  • NNT = 100 / 4 = 25 (treat 25 to prevent one stroke)
  • NNH = 100 / 1 = 100 (treat 100 to cause one major bleed)

Favourable trade-off: NNT < NNH.

3. The Calculation (Fast Mode)

Step 1: Get the ARR as a Percentage

Placebo % – Treatment %

Step 2: Divide 100 by that number

NNT = 100 / ARR

Hazard Ratios

4. Hazard Ratios (HR)

Difference between HR and Relative Risk?

Both compare risk, but Hazard Ratios include TIME.

  • Relative Risk: “By the end of the study, did you die?” (Yes/No).
  • Hazard Ratio: “How fast are people dying in Group A vs Group B?”

AKT‑Style Question

A statin reduces the risk of cardiovascular events from 4% to 2% over 5 years. The drug company advertises this as a “50% reduction in risk”.

Which statement best describes this claim?

  • A. It is accurate and reflects the absolute risk reduction
  • B. It is misleading because it uses relative risk reduction
  • C. It is incorrect because the NNT is 2
  • D. It is incorrect because the absolute risk reduction is 50%

Correct answer: B

The risk drops from 4% to 2% — an absolute risk reduction (ARR) of 2%. The company quotes the relative risk reduction (RRR), which is 50%. RRR often sounds dramatic, but ARR and NNT give a more realistic picture of benefit.

Common AKT Traps

  • Confusing RRR with ARR RRR often looks impressive. ARR tells you the real‑world benefit.
  • Forgetting how to calculate NNT NNT = 100 / ARR (when ARR is in percentage points). ARR of 2% → NNT = 50.
  • Assuming a large RRR means a large clinical effect A 50% reduction from 0.2% to 0.1% is tiny in absolute terms.
  • Misinterpreting hazard ratios HR < 1 means reduced rate of events over time — not necessarily fewer total events at the end.
  • Thinking NNT applies to all populations NNT depends on baseline risk. Higher baseline risk → lower NNT (bigger benefit).
  • Believing ARR and NNT are fixed properties of a treatment They vary with population risk, duration, and outcome definition.

You should now be able to:

  • Calculate ARR and RRR
  • Explain why ARR is more clinically honest than RRR
  • Calculate NNT from ARR
  • Interpret treatment effects without being misled by marketing

Chapter 11: Forest Plots & Meta-Analysis

“Meta-Analysis & The Fruit Salad”

Treatment & Analysis · Forest Plots

1. Anatomy of a Forest Plot

Each line is a study. The big diamond is the answer.

Hover over elements!
Line of No Effect (1.0) Favours Treatment Favours Placebo Study A: Large Sample Size (Narrow CI, Big Square) Study A Study B: Small Sample Size (Wide CI, Small Square). Crosses line = Not Significant. Study B Study C: Medium Sample Size. Significant. Study C Pooled Result: The Diamond is clear of the line. Overall result is SIGNIFICANT. Total

Heterogeneity (I²)

2. Heterogeneity (I-squared)

The “Fruit Salad” Analogy

🍎🍎

Low (I² < 25%)

“Comparing Apples with Apples.”
Good to combine.

🍎🍊🍌

High (I² > 75%)

“Fruit Salad.”
Studies are too different. Do not combine.

Publication Bias & Funnel Plots

3. Publication Bias (Funnel Plot)

No Bias (Symmetrical)

Triangle is full.

Bias! (Asymmetrical)

Corner missing. Negative studies hidden.

AKT‑Style Question

A meta‑analysis evaluates whether a new antihypertensive reduces stroke risk. The pooled effect estimate (diamond) shows a relative risk of 0.85 with a 95% CI of 0.78–0.92. The diamond lies entirely to the left of the line of no effect.

Which statement best describes this result?

  • A. The treatment has no significant effect on stroke risk
  • B. The treatment significantly reduces stroke risk
  • C. The result is not reliable because the diamond crosses 1
  • D. The individual studies must all show benefit

Correct answer: B

The pooled estimate (diamond) is entirely to the left of the line of no effect (1.0), and the CI does not cross 1. This means the treatment significantly reduces stroke risk. Individual studies may vary — the pooled estimate is what matters.

Common AKT Traps

  • Thinking every individual study must show benefit Meta‑analysis pools results — some studies may favour placebo, others treatment.
  • Misreading the diamond The diamond is the pooled effect. If it crosses the line of no effect → not significant.
  • Confusing study weight with effect size Bigger boxes = more weight, not bigger effect.
  • Ignoring heterogeneity High I² means studies disagree — pooled results should be interpreted cautiously.
  • Assuming the line of no effect is always 1 For ratios (RR, OR, HR) → 1 For differences (e.g., mean difference) → 0
  • Believing a significant pooled effect means the treatment is clinically important Statistical significance ≠ clinical significance.

You should now be able to:

  • Identify the pooled effect (diamond)
  • Interpret whether the diamond crosses the line of no effect
  • Explain study weights (box sizes)
  • Recognise high heterogeneity and its implications
Download as PDF

Final Revision Section

The following resources are designed to consolidate your learning and test your exam readiness.

What to Memorise for the AKT

  • P‑Values: P < 0.05 = statistically significant. P‑value ≠ probability the hypothesis is true.
  • Confidence Intervals: For ratios → CI crossing 1 = not significant. For differences → CI crossing 0 = not significant.
  • SnOut / SpIn: High Sensitivity → rules OUT disease. High Specificity → rules IN disease.
  • Prevalence: PPV and NPV change with prevalence. Sensitivity and specificity do not.
  • Likelihood Ratios: LR+ > 10 = strong rule‑in. LR– < 0.1 = strong rule‑out. LR ≈ 1 = useless.
  • Hazard Ratios: HR < 1 = reduced event rate. HR > 1 = increased event rate.
  • Absolute vs Relative Risk: ARR = real‑world benefit. RRR often exaggerates effect size.
  • NNT: NNT = 100 / ARR (when ARR is in percentage points).
  • Forest Plots: Diamond = pooled effect. Diamond crossing line of no effect = not significant.
  • Kaplan–Meier: Higher curve = better survival. Crossing curves = unreliable comparison.

Final AKT Stats Mini‑Quiz (Advanced Level)

Final AKT Stats Mini‑Quiz

This is an advanced quiz. These 10 questions are deliberately more challenging than the rest of the page and reflect the trickiest styles, traps, and interpretations used in the real AKT exam.

Time yourself: 10 questions, 10 minutes. Click an answer to reveal the explanation.

1. A study reports P = 0.049. Which interpretation is most accurate?

2. A 95% CI for a risk ratio is 0.92–1.01. What does this mean?

3. A test has sensitivity 98% and specificity 40%. In GP, prevalence is 1%. What is true?

4. A test has LR+ = 2 and LR– = 0.05. Which is true?

5. In a Kaplan–Meier curve, the treatment curve stays above the control curve, but they cross briefly at 6 months. What does this imply?

6. A hazard ratio of 0.65 means:

7. A drug reduces risk from 12% to 9%. What is the NNT?

8. In a forest plot, the pooled diamond touches but does not cross the line of no effect. What does this mean?

9. A logistic regression shows an odds ratio of 1.8 for smoking and chronic cough. What does this mean?

10. A meta‑analysis shows high heterogeneity (I² = 78%). What is the correct interpretation?

Continue your AKT preparation

External references · RCGP — About the AKT · NICE Clinical Knowledge Summaries · EQUATOR Network — reporting guidelines

Ready for the next step?

You’ve nailed stats. Now nail the AKT.

The AKT Pack covers every RCGP curriculum topic with the same teaching style you’ve just experienced. 2,000+ AKT-style questions, with new ones added weekly, NICE-linked explanations, adaptive mocks. Written by a practising UK GP.

Or share this guide with a fellow GP trainee.

GP Core Revision Series | Medical Statistics