In this article, I raise some queries about two very interesting articles from the excellent election news site May2015. The articles suggest that David Cameron faces an extremely difficult task to remain as prime minister, based on the mathematics of how national polls will translate into parliamentary seats. I wonder whether the statistics support the level of confidence with which this conclusion is presented.
Polls, polls, polls
As someone who is interested in statistics and, to a lesser extent, politics, I have been taking an unhealthy interest in the movements (or non-movements) of the polls in the run-up to the UK general election on 07 May, and the dizzying array of commentary that has surrounded them. One of the sites that I have particularly enjoyed is May2015, which provides a fascinating range of articles and information alongside their regularly updated seat by seat predictions of the election outcome.
My motivation in writing this piece is not partisan, but rather to consider the analysis and conclusions of two particular articles (here and here) from May2015. Theses articles present an extremely favourable picture of Ed Miliband’s chances of getting to Number Ten, but I have some reservations about the way that May2015 have come to their conclusions and the degree of confidence with which they have presented them.
In the first article, seeking to determine how Labour and their de facto allies could reach the key total of 323 seats that will be required to form a government, Harry Lambert (author of both articles) meticulously goes through the 650 parliamentary constituencies, dividing them into a number of relevant categories. He first counts up safe seats for Labour, then for the other parties that would block a Conservative government (most notably the SNP) and then includes some apparent Labour gains from the Lib Dems, arriving at a total of 288 seats that can be fairly safely assumed to return candidates favourable to a Labour government. This leaves 35 seats (downgraded to 34 in the second article, based on new information about an additional Labour gain from the Lib Dems) that Labour needs to take from the Conservatives across 80 English Lab-Con marginal constituencies.
To work out how likely Labour are to gain the necessary seats, Lambert plots these 80 marginal constituencies with Labour’s predicted margin of victory according to May2015 along the horizontal axis and Labour’s probability of victory according to electionforecast.co.uk (an academic forecasting site, associated with the well-known US election forecasters FiveThirtyEight) on the vertical axis. You can see the most recent version of this figure here.
The articles identify 24 of these seats, the dark red points in the top right hand corner of the plot, as being “particularly likely to vote Labour” (since both May2015 and Election Forecast agree that Labour are strong in these constituencies) and thus allocates them to the Labour bloc. This leaves 10 seats that Ed Miliband needs to win from among the pale red points (leaning Labour in both forecasts), green points (where there is a discrepancy between the May2015 and Election Forecast’s predictions) and pale blue points* (leaning Conservative) if he is to become prime minister. Supposing that Labour wins in the the nine pale red constituencies where the party are slight favourites, Labour only requires one additional seat to reach 323. The latest May2015 article therefore concludes that Cameron needs to win “every single marginal seat – every green (and every light blue) seat – to stay in power.” Their conclusion is that, since the chance that every one of these close races going Cameron’s way appears to be remote, the election is “Ed Miliband’s race to lose.”
Some minor issues
While I do not take issue with the overall conclusion of the two articles, that Ed Miliband seems to be favourite to become prime minister owing to a greater flexibility in the ways that he could get to the 323 seat threshold that is required, I do have certain specific reservations about the degree of confidence with which this conclusion is stated, given the data that is being presented. There are three points that I think should be raised, two minor and one major.
First the minor points. It wonder whether there is not a slight issue with using the Election Forecast predictions and the May2015 predictions to reinforce each other because they do not give wholly independent perspectives on the election. May2015 are basing their predictions on the current polls, while Election Forecast are doing some additional mathematical modelling of future poll movement, but both forecasts are based on the same underlying data (the polls). Any unexpected issues with that data will affect both forecasts to some degree. I don’t think that this would really affect the May2015 conclusions, but I think that it is at least worth a mention.
Next, if we look carefully at the 24 dark red constituencies in the plot, which the May2015 are allocating to Labour, we see that the Election Forecast probabilities of a Labour victory in these seats can drop as low as two thirds. 66.7% is far from a certainty, so automatically giving these seats to the Labour bloc seems a fairly bold move (and that is without considering the pale red seats). One counterargument to this could be that there are just as many blue points in the bottom left corner of the graph, so in probabilistic terms, ‘unexpected’ Labour losses towards the top right should be at least balanced out by ‘unexpected’ gains down there. However, this argument is problematic, since it leads me on to my biggest cause for hesitation over the May2015 argument.
A more significant concern
The quote from the May2015 articles that best sums up my main reservation, and that makes me feel that their conclusions about the strength of Ed Miliband’s position may be a little overstated, is this:
“In other words, David Cameron would need to win every single marginal seat – every green (and every light blue) seat – to stay in power. If the odds in these green seats are 50:50, that would be like winning 10-15 coin tosses in a row.”
This analogy seems to be chosen to make the chance of a Conservative government seem particularly remote. However, I find the comparison of the results of the marginal constituencies with the tosses of a coin to be rather troubling for the following reason: while the individual tosses of a coin are statistically independent, the results in the marginal constituencies are not.
As far as I understand it, there are three main areas of uncertainty over how polling, which is mostly at the national level, will translate into parliamentary seats. Firstly, there is uncertainty over how accurately the polls represent the current levels of support for the various parties; secondly there is uncertainty over how those polls will change between now and the election; and thirdly, there is uncertainty over how the support for different parties nationally will translate into support in the individual constituencies. Of these three factors, only the third can be claimed to be in any way independent from constituency to constituency. Any discrepancies between poll numbers and the true levels of support for the different parties and any changes in these levels of support over the next ten days should be expected to have an impact in all constituencies, to a greater or lesser degree.
Even the third source of uncertainty, the way in which national polls translate to local battlegrounds, cannot be said to be truly independent across constituencies, since we might expect there to be a degree of geographic correlation, with nearby seats exhibiting some similarities. Given that the Labour-Conservative marginals seem to be clustered in certain areas of the country (see, for example, this article from the BBC or this one from the Independent), this factor could be quite important.
The upshot of this is that if the Conservatives take one of the green or pale red marginal seats from the May2015 graph – perhaps due to the polls systematically underestimating their support or to an unexpectedly significant swing towards them in the next week – then the likelihood of them taking many of the others would surely be significantly higher owing to the statistical dependence of the results. This makes the marginal seat contest less like a series of separate coin tosses, with Cameron needing an unlikely sequence of heads, and more like a single coin toss, albeit with a coin that may be slightly biased in Labour’s favour. Statistical dependence between constituencies suggests that the situation may not be quite as favourable for Labour as the May2015 articles seem to suggest.
If seats genuinely were independent contests, with no significant correlation between their results at all, this would make a mockery of predictions based on Uniform National Swing (in which poll movements at the national level are supposed to be replicated in individual constituencies), a forecasting method that, while perhaps not expected to be particularly relevant for this election, has nonetheless been used reasonably successfully to predict the results of many previous ones. Indeed, although May2015’s forecast is not specifically based on UNS, it is nevertheless necessarily derived from a translation of national polling data to local constituencies (with a limited amount of constituency-level polling thrown in), so it must also involve a significant model dependency between the predicted results in different constituencies.
Furthermore, it seems to me that there is reason to suspect that the first of the three sources of uncertainty that I mention above (discrepancies between polling and the true levels of party support) may be of some relevance in the coming election, because there do appear to be some systematic differences between the different polling companies. For example, Populus have not put the Conservatives ahead in any poll that they have performed this year (as stated in the second May2015 article), while YouGov polls have shown the lead flipping between the parties (within the margin of error). Most forecasts are based on some form of average across all the polling companies, but if some pollsters are actually getting systematically closer to the true state of affairs than others, then some bias must be being passed downstream to the forecasters.
A perspective from Election Forecast
Following the first of the two May2015 articles, the team behind Election Forecast published a response, which sounded a very similar note of caution to the one that I have expressed above. They too suggested that May2015 were underplaying the uncertainty around the current forecasts, stating that:
“We are not convinced that there is currently sufficient evidence to conclude that Ed Miliband is significantly more likely than David Cameron to be able to put together a group of 323 like-minded legislators on May 8th.”
Based on thousands of computer simulations of their own model, they concluded that the bloc containing Labour and its allies had a probability of 54% of holding 323 seats or more, with the Conservative bloc having a 46% chance; still an advantage for Ed Miliband, but not a particularly convincing one. From what I can tell, the reasons for this discrepancy are broadly the same as those that I set out above.
Caveats, disclaimers etc.
I should state that this is not, in any way, an attempt to undermine the work of May2015 or author, Harry Lambert. I found the two articles in question to be very interesting and the broad sweep of their conclusions (that Labour have the mathematical advantage in terms of government formation) seems very sound and well-argued. I am not a political scientist (or a statistician); they are the ones who have gathered the data, who have created the forecasting model and who clearly have a large amount of experience and expertise in political science and polling, none of which I could or would claim for myself. I hope that they continue to produce such thought-provoking and well-informed pieces for the rest of us to enjoy. My only purpose here was to comment on the fairly narrow (but important) point of statistical dependence between constituency results, which I feel was underplayed in their articles.
Whatever the result, the election promises to be quite exciting. I shall be staying up on the night of 07 May, drinking plenty of coffee, and paying close attention to the 80 marginal constituencies that May2015 have identified.
UPDATE: Since this piece was written, May2015 have edited their second article with a response to my comments:
But, as Election Forecast have noted, and this engaging post by Thomas Oleron Evans stresses, these seats are unlikely to be wholly independent of one another, as this coin toss analogy assumes. They are likely to be right or wrong en masse, because of a structural polling error. In Oleron Evans’ words:
“The marginal seat contest [is] less like a series of separate coin tosses, with Cameron needing an unlikely sequence of heads, and more like a single coin toss, albeit with a coin that may be slightly biased in Labour’s favour.”
We wouldn’t accept the odds are only slightly biased in Labour’s favour. This seems to assume the error will be in the Tories’ favour, when Ashcroft’s polls could be systematically wrong in Labour’s favour.
So our previous post was slightly more strongly worded than that, but the odds of a Tory victory are undoubtedly greater than 10 consecutive coin tosses – that happens 1 in every 1,000 times. (It was a flippant analogy.)
They are right to pick up on my choice of language here. I should probably have said “somewhat biased”, rather than “slightly biased”.
* There is also one pale purple point, Thurrock, which is a three-way marginal with UKIP.
Thomas Oléron Evans, 2015