The GP Core Revision Guide: Ultimate Edition

This guide is free · The complete AKT Pack has 2,000+ AKT-style questions, with new ones added weekly like these See the AKT Pack →

The GP Core Revision Guide to Medical Statistics

The Ultimate Revision Guide

“From P-Values to Practice”

Welcome to the foundation. In the AKT, you don’t need to be a mathematician; you need to be a BS detector. This guide combines high-yield explanations with interactive simulations to help you “feel” the numbers.

Features

Interactive Simulators + AI Coach

How to Use This Guide

This guide is designed to make statistics intuitive, visual, and clinically relevant. Work through each chapter in order, using the interactive widgets to build real understanding rather than memorising formulas. The Common AKT Traps boxes highlight the exam’s favourite pitfalls, and the AKT‑style questions help you test your knowledge as you go.

When you reach the end, challenge yourself with the Final AKT Stats Mini‑Quiz and review the What to Memorise for the AKT summary. This guide is best used actively — adjust sliders, interpret graphs, and pause to predict answers before revealing them.

Chapter 1: P‑Values

The P-Value: The “Fluke” Factor

The P-value stands for Probability. It measures the likelihood that the result happened purely by chance.

Interactive: The Scales of Truth

Adjust P-Value: 0.05 Significant

The Pub Explanation

Imagine playing heads-or-tails. If your mate wins 10 times in a row, the probability of that happening by luck is tiny (P < 0.05).

Use the slider! Notice how when P is high (e.g., 0.40), the scales are balanced. Chance is heavy. When P is low (e.g., 0.01), “Evidence” slams down. The result is real.

The Courtroom Analogy: Type 1 vs Type 2 Errors

Type 1 Error (False Positive)

“Convicting an Innocent Man”

You say there IS a difference (Significance), but actually there is NONE.
Clinical example: telling a healthy patient they have cancer.

Type 2 Error (False Negative)

“Letting a Guilty Man Go Free”

You say there is NO difference, but actually there IS one (you missed it).
Usually caused by small sample size (Study wasn’t powerful enough).

Chapter 2: Confidence Intervals

Confidence Intervals & Significance

P-values are a simple “Yes/No”. CIs give you the magnitude. But crucial for the exam is spotting the Line of No Effect.

Interactive: The Bridge of Significance

Drag the slider to move the result. Watch the color change!

SIGNIFICANT

It does NOT touch the line.

Difference (e.g. BP)

Line of No Effect = 0

“Zero for Hero”

Ratio (e.g. Relative Risk)

Line of No Effect = 1

“One is Done”

AKT‑Style Question

A clinical trial compares Drug A with placebo for reducing migraine frequency. The study reports P = 0.03.

Which statement best describes this result?

A. There is a 3% chance the drug works
B. There is a 97% chance the drug works
C. The result is unlikely to be due to chance
D. The drug is clinically effective

Correct answer: C

A P‑value of 0.03 means that if the drug had no real effect, the probability of seeing a result this extreme (or more extreme) by chance alone is 3%. It does not tell you the probability the drug works — only how surprising the result is under the assumption of no effect.

Common AKT Traps

Thinking P‑value = probability the hypothesis is true It doesn’t. It’s the probability of the data assuming no effect.
Assuming “statistically significant” = “clinically important” A tiny difference can be statistically significant in a large study.
Believing P < 0.05 proves the treatment works It only suggests the result is unlikely due to chance — not that the effect is meaningful.
Ignoring sample size Small studies often fail to reach significance even when a real effect exists (Type 2 error).
Misinterpreting P > 0.05 It does not mean “no effect” — it means “not enough evidence to detect one”.

You should now be able to:

Explain what a P‑value actually represents
Recognise common misinterpretations (e.g., “probability the hypothesis is true”)
Interpret P < 0.05 and P > 0.05 correctly
Understand why statistical significance ≠ clinical significance
Interpret a 95% CI for ratios and differences
Identify when a CI indicates statistical significance
Explain why narrower CIs = more precision
Spot when a CI crosses the line of no effect

Part 1: Foundations

Chapter 3: Study Design

“What type of study is this?”

The AKT loves asking “what type of study is this?” — get the design right and you’re halfway to the right answer about its strengths, weaknesses, and the kind of evidence it produces.

The Four Big Designs

Randomised Controlled Trial (RCT) — the gold standard

Patients are randomly allocated to intervention or control. Randomisation balances confounders, blinding reduces bias. Best for establishing causation. Cons: expensive, time-consuming, sometimes unethical, may exclude real-world patients.

Cohort Study — follow them forwards

Classify a group by exposure (e.g. smokers vs non-smokers) and follow over time to see who develops the outcome. Good for rare exposures; gives incidence and relative risk. Long follow-up needed; vulnerable to loss-to-follow-up.

Case-Control Study — look backwards

Take cases (with disease) and controls (without), then compare past exposures. Quick and cheap; ideal for rare diseases. Vulnerable to recall bias; gives odds ratios, not relative risk.

Cross-Sectional Study — a snapshot

Measure exposure and outcome at the same time. Quick; good for measuring prevalence. Cannot establish cause and effect (chicken-and-egg).

Hierarchy of Evidence

Systematic review / meta-analysis of RCTs
RCT
Cohort study
Case-control study
Cross-sectional study
Case series / case report
Expert opinion

Intention-to-Treat (ITT) vs Per-Protocol

Intention-to-treat (ITT): analyse patients in the group they were originally allocated to, regardless of whether they actually took the treatment. Preserves randomisation. Conservative, real-world estimate. The preferred analysis for RCTs.

Per-protocol: analyse only those who completed the treatment as planned. Can exaggerate treatment effect because non-compliers are removed.

Memory hook: ITT = “what happens when you prescribe it” (pragmatic). Per-protocol = “what happens when you take it” (explanatory).

AKT‑Style Question

A new treatment for resistant hypertension is being investigated. The condition is rare, and follow-up over several years is impractical. Which study design is most appropriate?

Common AKT Traps

Confusing cohort and case-control. Cohort = exposure → outcome (forwards). Case-control = outcome → exposure (backwards).
Calling a cross-sectional study an “incidence study”. It measures prevalence, not incidence.
Assuming RCTs are always best. They aren’t always feasible or ethical.
Mixing up ITT and per-protocol. ITT is the conservative, randomisation-preserving choice.
Forgetting case-control studies give odds ratios, not relative risk.

You should now be able to:

Identify the four major study designs from a stem
Recall the hierarchy of evidence
Match study design to research question
Distinguish ITT from per-protocol analysis

Chapter 4: Bias & Validity

“What’s wrong with this study?”

The AKT exam is full of questions that essentially ask “what’s wrong with this study?”. Knowing the named biases and what they look like is high-yield revision.

The Main Types of Bias

Selection bias — participants aren’t representative of the population. Example: recruiting only hospital inpatients for a community-level study.

Recall bias — memory of past exposures is influenced by having the outcome. Example: mothers of children with congenital abnormalities recall exposures more thoroughly than mothers of healthy children. Classic in case-control studies.

Measurement (information / observer) bias — outcomes or exposures measured differently between groups. Reduced by blinding.

Attrition bias — drop-outs differ systematically from those who stay.

Publication bias — positive studies are more likely to be published than negative ones (see Forest Plots chapter).

Lead-time bias — a screening test detects disease earlier, making survival from diagnosis look longer even if the time of death is unchanged.

Length-time bias — screening preferentially picks up slow-growing, less aggressive disease. Makes screening look more effective than it is.

Validity & Reliability

Internal validity

Does the study correctly answer its own question? (Free from bias and confounding within the study.)

External validity

Do the results apply to other populations beyond the study? (Generalisability.)

Reliability

Does the measurement give the same result on repeat testing? (Consistency.)

Validity

Does the measurement actually measure what it claims to? (Accuracy.)

The dartboard analogy. Reliable = arrows all in the same spot. Valid = arrows all on the bullseye. You can be reliable without being valid (consistently wrong).

Reducing Bias

Randomisation — balances confounders between groups.
Allocation concealment — recruiters don’t know which group the next patient will be assigned to.
Blinding — single (patient), double (patient + investigator), triple (also analyst).
ITT analysis — preserves randomisation.
Standardised outcome assessment — same tools, same training, same blinding.

AKT‑Style Question

A case-control study investigates whether anti-emetic use in pregnancy is associated with cleft palate. Mothers of affected babies recall their exposures more thoroughly than mothers of unaffected babies. Which bias is most likely?

Pre-test → Post-test Probability: The Fagan Nomogram

The Fagan nomogram is a graphical tool that combines three things to give you a clinical answer:

Pre-test probability — your clinical suspicion before the test (often estimated from prevalence or scoring systems like Wells’).
Likelihood ratio — from the test.
Post-test probability — the answer.

Draw a line from your pre-test probability, through the LR, and read off the post-test probability.

Why this matters: LRs are independent of prevalence, but post-test probability is not. The same positive test result means very different things in a 70-year-old smoker with chest pain (high pre-test probability) versus a 25-year-old with the same symptom (low pre-test probability).

Rule-of-thumb shifts (without the nomogram)

Likelihood ratio	Approximate change in probability
LR+ 10	+45 percentage points
LR+ 5	+30 percentage points
LR+ 2	+15 percentage points
LR– 0.5	–15 percentage points
LR– 0.1	–45 percentage points

Common AKT Traps

Mixing up validity and reliability. Valid = right. Reliable = consistent.
Confusing internal and external validity. Internal = within the study. External = generalisability.
Forgetting lead-time and length-time bias — they mostly apply to screening but do come up.
Assuming blinding fixes everything. It reduces measurement bias but doesn’t address selection bias.
Treating “randomised” and “blinded” as synonyms. Randomisation deals with confounders at the start; blinding deals with bias during follow-up.

You should now be able to:

Name and recognise the main types of bias
Distinguish validity from reliability
Distinguish internal from external validity
Explain how randomisation, blinding, and ITT reduce bias

Enjoying this so far? This guide is a sample of the AKT Pack — 2,000+ AKT-style questions, with new ones added weekly across every RCGP topic, written by a practising UK GP.

See the AKT Pack →

Chapter 5: Diagnostic Tests — The 2×2 Table & Prevalence

“Diagnostic Tests & The 2×2 Table”

1. The Fishing Net Analogy

We use tests to catch diseases. Think of the test as a Fishing Net.

Sensitivity (The Net)

🐟

👢

High Sensitivity: Catches all the fish (True Positives).
(But also catches junk = False Positives). Rules OUT disease (SnOut).

Specificity (The Sorter)

👢 ➡️ 🗑️

👢

High Specificity: Throws back the rubbish. Correctly identifies True Negatives. Rules IN disease (SpIn).

The 2×2 Table: Coeliac Disease

You test 1,000 patients in a clinic where coeliac prevalence is 10%.

	Disease Present (Sick)	Disease Absent (Healthy)	Totals
Test Positive (+)	80True Positive	10False Positive	90
Test Negative (-)	20False Negative	890True Negative	910
Totals	100	900	1000

The Prevalence Trap

Why good tests fail in Primary Care

Sensitivity and Specificity DO NOT change. They are fixed properties of the test.

However, PPV and NPV CHANGE depending on Prevalence. In GP (Low Prevalence), PPV drops massively. Most positives will be False Positives.

AKT‑Style Question

A new test for coeliac disease has a sensitivity of 90% and a specificity of 95%. In a GP population where the prevalence of coeliac disease is 1%, a patient tests positive.

Which statement is most accurate?

A. The patient almost certainly has coeliac disease
B. The test has ruled the disease in
C. Most positive results will still be false positives
D. The test is not sensitive enough for primary care

Correct answer: C

Even with excellent sensitivity and specificity, a low‑prevalence setting like GP massively reduces the positive predictive value (PPV). When prevalence is only 1%, most positive results will be false positives — the classic Prevalence Trap.

Common AKT Traps

Confusing sensitivity with PPV Sensitivity tells you how well the test detects disease — not how likely a positive result is to be true.
Ignoring prevalence PPV and NPV change dramatically with prevalence. Sensitivity and specificity do not.
Assuming a “good test” works everywhere A hospital‑grade test can perform poorly in GP because of low prevalence.
Thinking high sensitivity = rules in High sensitivity tests rule OUT disease (SnOut). High specificity tests rule IN disease (SpIn).
Forgetting false positives dominate in low prevalence settings Even a tiny false‑positive rate overwhelms true positives when disease is rare.

You should now be able to:

Calculate sensitivity, specificity, PPV and NPV
Explain how prevalence affects PPV/NPV
Apply SnOut and SpIn correctly
Interpret 2×2 tables confidently

Chapter 6: Likelihood Ratios & The Fagan Nomogram

“LR+ and LR–: How much does a test change your mind?”

Likelihood ratios tell you how much a test result changes your belief that a patient has a disease. They are the “shove” that moves you from pre‑test to post‑test probability.

The Golden Rules

LR+ > 10 → Strong rule‑in
LR– < 0.1 → Strong rule‑out
LR ≈ 1 → No diagnostic value

The Intuition

Think of LR+ and LR– as shoves: LR+ pushes you towards disease. LR– pushes you away from disease.

Clinical Example

D‑dimer: Sensitivity 95%, Specificity 50%
LR+ = 1.9 (weak rule‑in)
LR– = 0.1 (strong rule‑out)
→ Explains why D‑dimer is a rule‑out test.

The Formula (For Understanding Only)

You won’t need to calculate these in the AKT, but knowing the structure helps:

LR+ = Sensitivity / (1 – Specificity)
LR– = (1 – Sensitivity) / Specificity

Interactive: Likelihood Ratio Calculator

Adjust sensitivity and specificity to see how LR+ and LR– change. This shows why some tests are powerful rule‑ins, others rule‑outs, and some… are useless.

Sensitivity: 0.90

Specificity: 0.80

Results

LR+

—

LR–

—

LR+ Shove Meter

How hard a positive test result pushes you toward diagnosing disease.

Useless Weak Rule‑In Strong Rule‑In

AKT‑Style Question

A test for pulmonary embolism has a sensitivity of 95% and a specificity of 50%. What is the most appropriate interpretation of its likelihood ratios?

Which statement is most accurate?

A. LR+ is high, so the test is good for ruling in PE
B. LR– is high, so the test is poor for ruling out PE
C. LR+ is low, so a positive result does not increase the probability of PE much
D. LR– is high, so a negative result increases the probability of PE

Correct answer: C

With sensitivity 95% and specificity 50%, the LR+ is around 1.9 — too low to rule in disease. The LR– is around 0.1, which is strong for ruling out. This is why tests like the D‑dimer are excellent rule‑out tools but poor rule‑ins.

Common AKT Traps

Thinking LR+ depends on prevalence It doesn’t. LR+ and LR– are properties of the test, not the population.
Confusing LR+ with PPV LR+ tells you how much a positive result shifts probability — PPV tells you how likely the patient actually has the disease.
Assuming a high sensitivity test rules in disease High sensitivity → good for ruling out (SnOut). High specificity → good for ruling in (SpIn).
Forgetting the “magic numbers” LR+ > 10 = strong rule‑in LR– < 0.1 = strong rule‑out LR ≈ 1 = useless
Believing LR+ and LR– must be symmetrical A test can be excellent at ruling out but terrible at ruling in (e.g., D‑dimer).

You should now be able to:

Interpret LR+ and LR– values
Recognise strong rule‑in and rule‑out thresholds
Explain why LR ≈ 1 is useless
Understand how LRs update probability

Chapter 7: Screening Principles

“Wilson & Jungner — and the biases that flatter screening”

Screening is testing asymptomatic people for a disease. It’s a deceptively complex topic because the benefits are easy to overstate and the harms are easy to overlook. The AKT question bank loves screening questions, particularly around the Wilson & Jungner criteria and screening-specific biases.

Wilson & Jungner Criteria (1968)

The classic ten principles for whether a screening programme is worthwhile. Memorise the headlines across four buckets:

About the disease

Important health problem
Recognisable latent / early symptomatic stage
Natural history adequately understood

About the test

Suitable test (sensitive, specific)
Acceptable to the population

About the treatment

Accepted treatment exists
Facilities for diagnosis & treatment available
Agreed policy on whom to treat

About the programme

Cost economically balanced
Continuing process, not one-off

Memory hook: disease, test, treatment, programme — four buckets, two or three points each.

Screening-Specific Biases

Lead-time bias

Screening detects disease earlier. Survival from diagnosis looks longer, even though the time of death is unchanged.

Length-time bias

Screening preferentially picks up slow-growing, less aggressive disease (aggressive tumours appear and kill between screens).

Selection bias (“healthy screenee” effect)

People who attend screening tend to be healthier than those who don’t. Outcomes look better partly because of who turns up.

Overdiagnosis

Detecting disease that would never have caused harm in the patient’s lifetime. Leads to overtreatment of indolent conditions (PSA screening is the classic example).

AKT‑Style Question

A new screening test for an indolent cancer is shown to “improve five-year survival from 60% to 85%” when introduced. There is no change in overall mortality.

Which bias most likely explains this finding?

Common AKT Traps

Confusing lead-time and length-time bias. Lead-time = earlier diagnosis, same death. Length-time = preferentially catching slow-growers.
Forgetting that “improved survival” ≠ “lower mortality”. They’re different endpoints.
Mixing up screening and case-finding. Screening targets asymptomatic populations; case-finding targets risk factors.
Assuming all positive screening results need treatment. Overdiagnosis means some shouldn’t.

You should now be able to:

Recall the headline Wilson & Jungner criteria
Distinguish lead-time, length-time, and selection bias in screening
Explain overdiagnosis and why it matters
Recognise why “improved survival” alone is a weak endpoint for screening

Halfway there. This guide is a sample of the AKT Pack — 2,000+ AKT-style questions, with new ones added weekly across every RCGP topic, written by a practising UK GP.

See the AKT Pack →

Chapter 8: Hazard Ratios & Kaplan–Meier Curves

“Survival over time — visualised”

Kaplan–Meier curves show survival over time. They are used when the outcome is not just “did you die?” but “how long until you die?” This makes them perfect for cancer trials, cardiovascular studies, and any condition where timing matters.

What the Curve Shows

The y‑axis = proportion surviving
The x‑axis = time
Each downward “step” = an event (e.g., death)
Flatter curve = better survival
Steeper curve = worse survival

What is Censoring?

Censoring happens when we stop knowing what happened to a patient. This might be because they:

left the study
moved away
were still alive when the study ended

On the graph, censored patients are shown as small tick marks.

How to Compare Two Curves

The curve that stays higher = better survival
The curve that drops faster = worse survival
Look for consistent separation between curves
Crossing curves = unreliable comparison

How Hazard Ratios Fit In

The hazard ratio compares the rate of events over time between two groups. It is the numerical summary of what the Kaplan–Meier curves show visually.

HR = 1 → no difference
HR < 1 → treatment improves survival
HR > 1 → treatment worsens survival

Key Takeaway

Kaplan–Meier curves show how survival changes over time. Hazard ratios tell you how fast events are happening. Together, they give a complete picture of treatment effect.

Interactive: Kaplan–Meier Survival Curves

Adjust event rates to see how survival curves change over time.

Event Rate (Group A): 0.10

Event Rate (Group B): 0.20

Note: This is a simplified visual model for learning — not a statistical estimator.

AKT‑Style Question

A trial compares Drug A with placebo for improving survival in heart failure. The Kaplan–Meier curves show a clear and consistent separation, with the Drug A curve remaining higher throughout. The hazard ratio (HR) is 0.70 with a 95% CI of 0.55–0.90.

Which statement best describes this result?

A. Drug A reduces the risk of death by 30% at the end of the study
B. Drug A reduces the rate of death over time compared with placebo
C. Drug A improves survival only at the final time point
D. The result is not statistically significant

Correct answer: B

A hazard ratio of 0.70 means the rate of death over time is 30% lower in the Drug A group. Kaplan–Meier curves showing consistent separation support this. HR does not describe the risk “at the end” — it reflects the instantaneous risk over time.

Common AKT Traps

Thinking the hazard ratio is the same as relative risk HR reflects the rate of events over time, not the final proportion who died.
Ignoring curve separation Consistent separation = strong evidence. Crossing curves = unreliable comparison.
Misinterpreting censoring Tick marks are not deaths — they mean the patient left the study or follow‑up ended.
Assuming a higher curve means “fewer deaths overall” It means a higher proportion surviving at each time point.
Believing HR < 1 always means a big effect HR < 1 means benefit, but the CI tells you if it’s precise and significant.
Forgetting that KM curves show time‑to‑event They are not simple bar charts of “alive vs dead”.

You should now be able to:

Interpret HR < 1 and HR > 1
Explain that HR = event *rate*, not total events
Recognise when HR is unreliable (e.g., crossing curves)
Understand proportional hazards assumptions
Interpret survival curves correctly
Explain censoring and why it matters
Identify when curves cross and why that breaks HR assumptions
Understand what “higher curve = better survival” means

Chapter 9: Regression & Odds Ratios

“Understanding Odds Ratios & Adjusted Models”

Regression models help us understand how different factors affect an outcome. In the AKT, you don’t need to calculate anything — you just need to interpret the numbers.

1. Odds Ratios (OR)

An odds ratio compares the odds of an outcome between two groups.

OR = 1 → no difference
OR > 1 → higher odds of the outcome
OR < 1 → lower odds of the outcome

Example: OR = 2.0 → “Twice the odds of the outcome.” Example: OR = 0.5 → “Half the odds of the outcome.”

2. Adjusted vs Unadjusted Models

An unadjusted model looks at one factor at a time. An adjusted model accounts for other variables that might influence the result.

Example: Adjusting for age, sex, smoking, BMI.

If the OR changes a lot after adjustment → confounding was present.

3. What is Confounding?

A confounder is a factor that:

affects the exposure
affects the outcome
distorts the true relationship

Example: Coffee drinking appears linked to lung cancer — until you adjust for smoking.

4. Multivariate Regression

This is simply a model that includes multiple predictors at once. It answers: “Which factors independently predict the outcome?”

Example: In heart disease, independent predictors might include:

age
smoking
blood pressure
cholesterol

5. How to Read a Regression Table

A typical table includes:

Variable (e.g., age, smoking)
Odds Ratio (effect size)
95% CI (precision)
P‑value (significance)

✅ If the CI does not cross 1 → significant ✅ If the CI crosses 1 → not significant ✅ OR > 1 → increases odds ✅ OR < 1 → decreases odds

Key Takeaway

Regression tells you which factors truly matter. Odds ratios tell you the direction and strength of the effect. Confidence intervals tell you the certainty of the estimate.

Interactive: Regression Table Explorer

Toggle variables to see how an adjusted model might change the “story”.

Include Age Include Smoking Include BMI Include Exercise

Variable	OR	95% CI	P‑value	In Model?

Model interpretation

Adjusted for age, smoking, BMI and exercise, smoking remains a strong independent predictor of the outcome.

AKT‑Style Question

A study looks at the association between smoking and chronic cough. The unadjusted odds ratio (OR) for smoking is 2.8. After adjusting for age, the OR falls to 1.6.

What is the most likely explanation?

A. Age is a confounder of the relationship between smoking and chronic cough
B. Smoking is not associated with chronic cough
C. The adjusted model is incorrect
D. The sample size is too small

Correct answer: A

The OR drops from 2.8 to 1.6 after adjusting for age, meaning part of the apparent effect of smoking was actually due to age. This is classic confounding: age is linked to both smoking and chronic cough, and adjusting for it reveals the true, smaller effect of smoking.

Common AKT Traps

Confusing correlation with causation Regression shows associations, not proof of cause.
Ignoring confounders If the OR changes a lot after adjustment, a confounder was influencing the result.
Misreading the confidence interval If the CI crosses 1, the result is not statistically significant — even if the OR looks impressive.
Over‑interpreting small p‑values A tiny p‑value doesn’t mean a big effect — it just means the result is unlikely due to chance.
Forgetting what “adjusted” means Adjusted models control for other variables. Unadjusted models do not — and are more vulnerable to confounding.
Assuming all predictors are equally important Some variables are strong independent predictors; others barely change the model.

You should now be able to:

Interpret odds ratios from logistic regression
Explain why odds ≠ risk
Recognise confounding and adjustment
Understand what a regression coefficient represents

Almost done. This guide is a sample of the AKT Pack — 2,000+ AKT-style questions, with new ones added weekly across every RCGP topic, written by a practising UK GP.

See the AKT Pack →

Chapter 10: Risk & Reward (ARR, RRR, NNT, NNH)

Risk & Reward (ARR, RRR, NNT)

“NNT, Hazard Ratios & Marketing”

The Risk & Reward chapter is all about understanding what treatment effects really mean in practice. Drug companies love to quote dramatic-sounding relative risk reductions (RRR), but clinicians need the more honest measures: absolute risk reduction (ARR) and number needed to treat (NNT). These tell you how much benefit a patient actually receives, not just how impressive the statistics look. Hazard ratios add a time dimension, showing how quickly events occur in each group. Together, these measures help you cut through marketing spin and interpret treatment effects with real clinical clarity.

1. The Marketing Trick (RRR vs ARR)

The Salesman (RRR)

50% Reduction!

“Cuts heart attack risk in half!”

Relative Risk Reduction (RRR)

The Scientist (ARR)

1% Reduction.

“Risk went from 2% to 1%.”

Absolute Risk Reduction (ARR)

2. Number Needed to Treat (NNT)

NNT answers: “How many people do I have to treat to save just ONE person?”

Simulation: NNT = 100

Number Needed to Harm (NNH)

The mirror image of NNT.

NNH = 100 / ARI (Absolute Risk Increase of an adverse event, expressed as a percentage).

NNH answers: “How many people do I have to treat for one to be harmed?”

A treatment is only worthwhile if NNT < NNH for the harm you’re worried about. This is the heart of shared decision-making.

Worked Example

A new anticoagulant prevents stroke in 4% of patients over 5 years (ARR = 4%) but causes major bleeding in 1% (ARI = 1%).

NNT = 100 / 4 = 25 (treat 25 to prevent one stroke)
NNH = 100 / 1 = 100 (treat 100 to cause one major bleed)

Favourable trade-off: NNT < NNH.

3. The Calculation (Fast Mode)

Step 1: Get the ARR as a Percentage

Placebo % – Treatment %

Step 2: Divide 100 by that number

NNT = 100 / ARR

4. Hazard Ratios (HR)

Difference between HR and Relative Risk?

Both compare risk, but Hazard Ratios include TIME.

Relative Risk: “By the end of the study, did you die?” (Yes/No).
Hazard Ratio: “How fast are people dying in Group A vs Group B?”

AKT‑Style Question

A statin reduces the risk of cardiovascular events from 4% to 2% over 5 years. The drug company advertises this as a “50% reduction in risk”.

Which statement best describes this claim?

A. It is accurate and reflects the absolute risk reduction
B. It is misleading because it uses relative risk reduction
C. It is incorrect because the NNT is 2
D. It is incorrect because the absolute risk reduction is 50%

Correct answer: B

The risk drops from 4% to 2% — an absolute risk reduction (ARR) of 2%. The company quotes the relative risk reduction (RRR), which is 50%. RRR often sounds dramatic, but ARR and NNT give a more realistic picture of benefit.

Common AKT Traps

Confusing RRR with ARR RRR often looks impressive. ARR tells you the real‑world benefit.
Forgetting how to calculate NNT NNT = 100 / ARR (when ARR is in percentage points). ARR of 2% → NNT = 50.
Assuming a large RRR means a large clinical effect A 50% reduction from 0.2% to 0.1% is tiny in absolute terms.
Misinterpreting hazard ratios HR < 1 means reduced rate of events over time — not necessarily fewer total events at the end.
Thinking NNT applies to all populations NNT depends on baseline risk. Higher baseline risk → lower NNT (bigger benefit).
Believing ARR and NNT are fixed properties of a treatment They vary with population risk, duration, and outcome definition.

You should now be able to:

Calculate ARR and RRR
Explain why ARR is more clinically honest than RRR
Calculate NNT from ARR
Interpret treatment effects without being misled by marketing

Chapter 11: Forest Plots & Meta-Analysis

“Meta-Analysis & The Fruit Salad”

1. Anatomy of a Forest Plot

Each line is a study. The big diamond is the answer.

Hover over elements!

2. Heterogeneity (I-squared)

The “Fruit Salad” Analogy

🍎🍎

Low (I² < 25%)

“Comparing Apples with Apples.”
Good to combine.

🍎🍊🍌

High (I² > 75%)

“Fruit Salad.”
Studies are too different. Do not combine.

3. Publication Bias (Funnel Plot)

No Bias (Symmetrical)

Triangle is full.

Bias! (Asymmetrical)

Corner missing. Negative studies hidden.

AKT‑Style Question

A meta‑analysis evaluates whether a new antihypertensive reduces stroke risk. The pooled effect estimate (diamond) shows a relative risk of 0.85 with a 95% CI of 0.78–0.92. The diamond lies entirely to the left of the line of no effect.

Which statement best describes this result?

A. The treatment has no significant effect on stroke risk
B. The treatment significantly reduces stroke risk
C. The result is not reliable because the diamond crosses 1
D. The individual studies must all show benefit

Correct answer: B

The pooled estimate (diamond) is entirely to the left of the line of no effect (1.0), and the CI does not cross 1. This means the treatment significantly reduces stroke risk. Individual studies may vary — the pooled estimate is what matters.

Common AKT Traps

Thinking every individual study must show benefit Meta‑analysis pools results — some studies may favour placebo, others treatment.
Misreading the diamond The diamond is the pooled effect. If it crosses the line of no effect → not significant.
Confusing study weight with effect size Bigger boxes = more weight, not bigger effect.
Ignoring heterogeneity High I² means studies disagree — pooled results should be interpreted cautiously.
Assuming the line of no effect is always 1 For ratios (RR, OR, HR) → 1 For differences (e.g., mean difference) → 0
Believing a significant pooled effect means the treatment is clinically important Statistical significance ≠ clinical significance.

You should now be able to:

Identify the pooled effect (diamond)
Interpret whether the diamond crosses the line of no effect
Explain study weights (box sizes)
Recognise high heterogeneity and its implications

Download as PDF

Final Revision Section

The following resources are designed to consolidate your learning and test your exam readiness.

What to Memorise for the AKT

P‑Values: P < 0.05 = statistically significant. P‑value ≠ probability the hypothesis is true.
Confidence Intervals: For ratios → CI crossing 1 = not significant. For differences → CI crossing 0 = not significant.
SnOut / SpIn: High Sensitivity → rules OUT disease. High Specificity → rules IN disease.
Prevalence: PPV and NPV change with prevalence. Sensitivity and specificity do not.
Likelihood Ratios: LR+ > 10 = strong rule‑in. LR– < 0.1 = strong rule‑out. LR ≈ 1 = useless.
Hazard Ratios: HR < 1 = reduced event rate. HR > 1 = increased event rate.
Absolute vs Relative Risk: ARR = real‑world benefit. RRR often exaggerates effect size.
NNT: NNT = 100 / ARR (when ARR is in percentage points).
Forest Plots: Diamond = pooled effect. Diamond crossing line of no effect = not significant.
Kaplan–Meier: Higher curve = better survival. Crossing curves = unreliable comparison.

Final AKT Stats Mini‑Quiz (Advanced Level)

Final AKT Stats Mini‑Quiz

This is an advanced quiz. These 10 questions are deliberately more challenging than the rest of the page and reflect the trickiest styles, traps, and interpretations used in the real AKT exam.

Time yourself: 10 questions, 10 minutes. Click an answer to reveal the explanation.

1. A study reports P = 0.049. Which interpretation is most accurate?

2. A 95% CI for a risk ratio is 0.92–1.01. What does this mean?

3. A test has sensitivity 98% and specificity 40%. In GP, prevalence is 1%. What is true?

4. A test has LR+ = 2 and LR– = 0.05. Which is true?

5. In a Kaplan–Meier curve, the treatment curve stays above the control curve, but they cross briefly at 6 months. What does this imply?

6. A hazard ratio of 0.65 means:

7. A drug reduces risk from 12% to 9%. What is the NNT?

8. In a forest plot, the pooled diamond touches but does not cross the line of no effect. What does this mean?

9. A logistic regression shows an odds ratio of 1.8 for smoking and chronic cough. What does this mean?

10. A meta‑analysis shows high heterogeneity (I² = 78%). What is the correct interpretation?

Continue your AKT preparation

Core Revision — AKT question bank for UK GP trainees
Adaptive mocks, NICE-linked explanations, and a daily challenge to keep your revision rhythm going.
AKT Ultimate Question Bank — 2,000+ AKT-style practice questions, with new ones added weekly
Apply the stats concepts from this guide in AKT-style scenarios across every RCGP curriculum topic.
Your personal Revision Hub
Track your progress, build a streak, and resume where you left off.

External references · RCGP — About the AKT · NICE Clinical Knowledge Summaries · EQUATOR Network — reporting guidelines

Ready for the next step?

You’ve nailed stats. Now nail the AKT.

The AKT Pack covers every RCGP curriculum topic with the same teaching style you’ve just experienced. 2,000+ AKT-style questions, with new ones added weekly, NICE-linked explanations, adaptive mocks. Written by a practising UK GP.

Browse the AKT Pack → See all features

Or share this guide with a fellow GP trainee.

How to Use This Guide

Chapter 1: P‑Values

The P-Value: The “Fluke” Factor

Interactive: The Scales of Truth

The Pub Explanation

The Courtroom Analogy: Type 1 vs Type 2 Errors

Chapter 2: Confidence Intervals

Confidence Intervals & Significance

Interactive: The Bridge of Significance

Difference (e.g. BP)

Ratio (e.g. Relative Risk)

AKT‑Style Question

Common AKT Traps

You should now be able to:

Part 1: Foundations

Chapter 3: Study Design

“What type of study is this?”

The Four Big Designs

Randomised Controlled Trial (RCT) — the gold standard

Cohort Study — follow them forwards

Case-Control Study — look backwards

Cross-Sectional Study — a snapshot

Hierarchy of Evidence

Intention-to-Treat (ITT) vs Per-Protocol

AKT‑Style Question

Common AKT Traps

You should now be able to:

Chapter 4: Bias & Validity

“What’s wrong with this study?”

The Main Types of Bias

Validity & Reliability

Internal validity

External validity

Reliability

Validity

Reducing Bias

AKT‑Style Question

Pre-test → Post-test Probability: The Fagan Nomogram

Rule-of-thumb shifts (without the nomogram)

Common AKT Traps

You should now be able to:

1. The Fishing Net Analogy

Sensitivity (The Net)

Specificity (The Sorter)

Part 2: Diagnostic Tests

The 2×2 Table: Coeliac Disease

Diagnostic Tests · Prevalence

The Prevalence Trap

Why good tests fail in Primary Care

AKT‑Style Question

Common AKT Traps

You should now be able to:

The Golden Rules

The Intuition

Clinical Example

The Formula (For Understanding Only)

Interactive: Likelihood Ratio Calculator

Results

LR+ Shove Meter

AKT‑Style Question

Common AKT Traps

You should now be able to:

Chapter 7: Screening Principles

“Wilson & Jungner — and the biases that flatter screening”

Wilson & Jungner Criteria (1968)

About the disease

About the test

About the treatment

About the programme

Screening-Specific Biases

Lead-time bias

Length-time bias

Selection bias (“healthy screenee” effect)

Overdiagnosis

AKT‑Style Question

Common AKT Traps

You should now be able to:

What the Curve Shows

What is Censoring?

How to Compare Two Curves