Our exam system is hopeless and teachers are wrong


Trade, Development, and Immigration
Tax and Fiscal Policy
Great controversy has followed the decision by OFQUAL, the exam regulator for England, to downgrade some 40% of teacher-assessed A-level results. It follows a similar row in Scotland where the figure is nearer a quarter for Highers. This is in the context of pandemic-inspired cancellations where the results are based on expert guesswork not exams.

Two major points follow. First, nothing about this should surprise us – teachers are often wrong about their pupils and biased to believe they are better than they are. Second, the exam system is hopelessly ill-adapted to this crisis and could have done better.

On the first point the Royal Statistical Society has aggregated concerns with the approach used by OFQUAL to moderate results. In brief, the regulator expected results to rise by 2% on the previous year, noted that teacher assessments yielded a 12% rise, and moderated downwards to reflect this bias.

They have done so generically, meaning that schools with high year-to-year variability in outcomes may have been penalised (or rewarded) unfairly compared with more stable performers. There is some evidence this impacts middle tier mostly state-comprehensive schools more than private schools. This is due to there being less variability at the extremes of performance and more testing, meaning better predictions by teachers.

Underlying the concern with the methodology more generally is the evidence that teachers are simply bad at guessing results. The study OFQUAL used to claim otherwise does not support their contention. It does suggest a 0.76 to 0.85 correlation between predictions and outcomes, but partially, and based on six mostly data-driven subjects like maths, where answers are generally right or wrong. Earlier studies cited in the same paper, covering a broader spectrum of subjects, note a range of 0.45 to 0.82, the lower end of which is worse than a coin toss.

That grades are being overstated (as opposed to normally distributed both above and below the mean) probably has psychological rather than nefarious causes. Teachers spend a year or more building a relationship with their charges and have every reason to want the best for them – not just from familiarity and decency but in that their own assessments and reward are based on demonstrating added value.

In that regard, there are potential penalties attached to under-estimating performance and potential gains to be made from over-estimating. There are further no penalties attached to being caught in overstatement. No teacher or head teacher is going to be fired for a decision by OFQUAL to reduce their grades. Some might for being too honest or for uncorrected pessimism.

Nor will they feel under pressure. Governing bodies and teaching unions will see fault with the regulator, not their employees and members. The public will sympathise with the teachers, with whom many have a relationship, not the faceless bureaucrats apparently responsible for the desolation of hope in their children; no longer leaping for joy on the front of every newspaper.

OFQUAL though may be wrong but in being too generous rather than unkind. If exams are objective assessments of ability, which is contestable but a reasonable assumption, why are they expecting a 2% rise in performance during a pandemic?

Several weeks of schooling at the most crucial time for exams have been lost to lockdowns. It has been replaced by variable home-schooling provision, something we assume is second best to the real thing. Most other major indicators of performance in the economy are falling. There are concerns about the psychological impact of isolation on impressionable young minds. We are social creatures and much of our learning is driven by peer-to-peer interaction, not just teacher to pupil. From that perspective a 2% rise sounds both heroic and deeply implausible.

But whether there’s a fall or rise may not matter very much. The fundamental purpose of academic exams is signalling, and generally within a single cohort at a single moment in time, not between years. They are not objective qualifications so much as evidence you know stuff and can think. Grades measure how much relative to those taking the papers at the same time, not objective competence. By the time you reach your 20s no employer cares much about how you did in your GCSEs, only what you can do. They seek evidence for this from a range of sources, not just your exam results, which end up forming 1-2 lines on a CV filled with achievement.

A qualification, for example a certificate in electrical safety, conversely denotes a degree of mastery that you are competent to rewire a fuse-board – vital if you want to be an electrician and prove it to a potential customer. An A-level in maths is principally useful if you wish to do further study in areas that require higher-level numeracy and theory. It is not so much a qualification in itself, but is useful and sometimes essential underpinning to a wide range of further study, qualifications and professions.

It is then principally a signal and a very important one for those going on to university, which is generally a moment-in-time decision. Some of those making that decision now will have their access challenged by OFQUAL’s downgrade putting them below their offer grades.

On this, pupils should be partially reassured that offer thresholds are likely to fall as a result of two other pandemic trends, deferrals and a fall in applications from foreign students. Universities need bodies to stay open and many are in dire trouble. The downgrade then may have no impact whatsoever on the process for which A-levels are the primary signal.

But some will still feel cheated, and it is not clear this can be finessed by better statistical modelling. If it were ever possible to accurately assess individual competence by design there would be no exams. A prediction model would be massively cheaper than a nationwide programme of formulation, testing, supervision, marking, appeals and re-sits. And such ideas are common in dystopian science fiction – the population being sorted at birth (or at some age of maturity) into leadership roles for the brainiacs, craft shops, warfare and mines for the rest.

Happily you can escape the circumstances of your birth in a free society and exams help signal your intent to try. All we can currently predict is which traits and circumstances are more likely to influence your path in life, not what that path will be.

OFQUAL’s approach then is a (very) second best solution to the cancellation of the exams that would otherwise have been set.

But was this the only solution? Certainly it was not possible to run the same exam system as in normal times. The schools were not fully open and any teacher or pupil with a long commute could not attend. Certainly the lockdown happened very late and was largely unexpected.

But what of online testing?

Most of us are bombarded with requests to complete web-surveys daily. Many of the tools to do this are free, or extremely cheap. It is very easy to set up surveys to collect both quantitative and qualitative data, with instant results for the former, and no more work to collate the latter than paper surveys (perhaps less, given the removal of the handwriting barrier and semiotic analysis tools that mirror key-phrase scoring by conventional markers).

It is not a perfect substitute – there are still digital access issues for many pupils, particularly from poorer backgrounds, whether cost, connections or reliability. There are supervision issues: how do you know if it’s actually the student if you can’t see them, how do you address claims of losing signal and demands for more time? But for each of these there are also solutions that reduce the risk of cheating, and no system can be cheat-free.

Further, perfection was not required. What was required was a better approach to assessing this year’s unfortunate school leavers than expert guessing, and it may still be possible with a bit of imagination. For example, pupils contesting their grades could be invited to sit a shorter sample test while connected to a cheap webcam showing their presence. That would be a partial examination approach, to offset guesswork downgraded by modelling. It would not be as good a test as a full exam, but a stronger signal of competence.

Done widely it would also provide evidential returns on whether there has been any difference in the performance of the lucky 2% of children whose parents working in the NHS meant their continued attendance throughout the lockdown, and those working from home. It would draw out differences between those schools offering comprehensive home schooling and those who left parents to fend for themselves.

By not doing anything objective, an enormous and rare opportunity to test the value of school and home schooling has also been wasted – which is a double disappointment if, as is currently being debated, a second wave means this happens again.

There exists then the possibility to offer something between the no-hope appeals process being offered now and full re-sits, if not nationwide, then at the very least in trial form such that this unsatisfactory state of affairs is never repeated.


Andy Mayer is Chief Operating Officer at the IEA. Andy worked as Head of Public Affairs, UK & Ireland at BASF plc for seven years. He has over 20 years of experience in strategic communications and the operations that support them in the business and think tank worlds.

1 thought on “Our exam system is hopeless and teachers are wrong”

  1. Posted 13/08/2020 at 09:19 | Permalink

    “Grades measure how much relative to those taking the papers at the same time, not objective competence…… An A-level in maths is principally useful…. [but] is not so much a qualification in itself.”
    “Certainly it was not possible to run the same exam system as in normal times. The schools were not fully open and any teacher or pupil with a long commute could not attend.”
    “Certainly the lockdown happened very late and was largely unexpected.“
    Three false assessments:
    1. When A-levels aren’t a qualification but merely a test of your rank in between a year group, so why are A-levels or better GCE regarded as a QUALIFICTION on level 4 of European Qualifications Framework? Potentially one might think that the author is making that statement because EQF is just one of these EU schemes. Just that the Regulated Qualifications Framework (RQF) regulated by Ofqual ALSO puts A-levels on level 4 of its QUALIFICATION framework and explicitly declares A-levels/GCE a QUALIFICATION!
    2.Of course it would have been possible to run the same exam system as in normal times as proven by Germany: Empty schools during lockdown allow small exam groups where social distancing and airing rooms frequently are much easier to organise than it would have been with schools running! Germany did it and there was NO rise in infections.
    3. If lockdown was still „unexpected“ at mid March as the author claims education authorities in the UK are unbelievable naive or showed a typically form of believing in Johnson induced English entitlement such as “Because we are English/British we can take Corona on the chin and no lockdown as in all these weak EU countries is nessesary”. Either or would show a lack of critical thinking that would astound me as a scientist.
    So to summarise it the author pretends to give an unbiased insight in the merits of Ofquals handling of pandemic hit A-level standardisation process while subconsciously or deliberately, I don‘t know, excluding proven facts.

Leave a Reply

Your email address will not be published.