eightyfivepoints.blogspot.com | Fri, 24 Aug 2018 03:24:00 +0000
While Liverpool were the biggest spenders of the transfer window -- they were the only EPL club with a net spend in excess of £100m -- another striking feature of their summer business was the speed and decisiveness with which they got it done: all four of their major signings were completed nearly three weeks before the end of the transfer window.
Prompt and decisive is a good description of Liverpool's recruitment strategy in the second half of the John Henry era. Since 2015, Liverpool have purchased 15 players in the summer transfer window (excluding free transfers, or those for which the fee was undisclosed) for a grand total of around £400m. Only one of those players was bought in August and that was the somewhat opportunistic purchase of Alex Oxlade-Chamberlain. The majority of their new recruits in recent years have had at least a couple of weeks to bed in before the season started.
How do Liverpool compare with the rest of the EPL in terms of the speed of their transfer operations in the summer transfer window? The plot below shows the proportion of transfers completed as a function of the number of days remaining in the transfer window. The black line shows the EPL average since the 2012/13 season; only transfers for which a fee was disclosed have been included, 464 in total. 50% of EPL transfers were completed with thirty-six days remaining in the transfer window, while 80% were completed with only five days left of the window. The grey region indicates the season-to-season variation over the last seven years. Despite closing early this year, there is little difference in the rate at which transfers were completed compared with previous seasons.
Figure 1: The proportion of transfers completed as the summer transfer window counts down towards deadline day. The black line shows the EPL average from the 2012 summer window to the end of the 2018 window; the grey region indicates the season-season variation. The red and blue dashed lines indicate the club-specific averages for Liverpool and Tottenham, the fastest and slowest clubs to complete their business.
The dashed lines show the club-specific averages for Liverpool (red) and Tottenham (blue). Over the last seven seasons, Liverpool have concluded their summer business more rapidly than any other of the nine EPL clubs to have maintained a continuous presence in the EPL. 80% of their transfers were completed with about 3 weeks remaining until the deadline, two weeks ahead of the EPL average. Generally speaking, they typically run their recruitment operation about two to three weeks in advance of the EPL average.
Spurs are at the opposite end of the spectrum, completing a large proportion of their incoming transfer activity towards the end of the window. Nearly half of their summer transfers since 2012 -- 14 players -- were purchased with just a week remaining in the transfer window. Last year all their transfer activity occurred in the last two weeks of the window, with three players brought in within a couple of days of the deadline.
A natural conclusion might be that Liverpool are simply more organised, drawing up a list of realistic transfer targets early and moving for them decisively. They have also shown patience in their transfer dealings, demonstrated by their willingness to wait until the next transfer window to complete business (see Keita and van Dijk) rather than hunt around for inferior alternatives should their first choice not be immediately available. Instead of purchasing another player late in the window, Liverpool have chosen not to purchase at all until their primary target is available.
Spurs, however, are operating on a tighter budget in order to cope with the financial pressures brought by the construction of a new stadium. Daniel Levy may strategically be choosing to operate nearer the deadline to make opportunistic purchases and avoid price hikes driven by drawn-out negotiations and auctions. Their lack of transfer activity this summer might simply reflect a perceived lack of value in the market.
Nevertheless, it's interesting that Liverpool and Tottenham, two clubs with good prospects of finishing in the top four this season, operate so differently. There is no evidence that Liverpool's faster start in the transfer market translates into an early edge when the league season starts, and Tottenham have gone on to finish above Liverpool in five of the last six seasons. Nevertheless, while Spurs' strategy of leaving it late to recruit appears to have worked in recent times, how sustainable is it in the long term? If they hold genuine aspirations of winning a first title since 1961, they may need to follow Liverpool's lead and become more pro-active in the market.
eightyfivepoints.blogspot.com | Wed, 13 Jun 2018 03:43:00 +0000
In the 65th minute of England's friendly against Costa Rica last week, Nick Pope stepped onto the pitch to make his first international appearance for England. At the same moment, Trent Alexander-Arnold's senior international debut came to end. Both players were selected for England's World Cup squad without having ever played for the senior men's side; only 9 other players at this World Cup have achieved that particular feat.
It is no secret that Southgate has selected a youthful side to take to Russia, but how do they compare in terms of age and experience to the other 31 tournament participants?
Figure 1 plots the average age of each World Cup squad against the average number of senior international appearances the players had made when the final squads were announced. As you might expect, there is a clear relationship between age and number of appearances. The average World Cup player is nearly 28 years old and has made about 34 appearances for his country.
Figure 1: Average squad age versus average number of senior international appearences (on the date the final squads were announced) for each of the 32 World Cup squads.
Costa Rica have the oldest squad in the tournament, with an average age of 29.5 and ten players in their thirties. The most experienced squads, however, are from Panama and Mexico: their players have made an average of 61 international appearances.
England have the second-youngest (joint with France) and the least experienced squad in the tournament. The average age of the England squad is 26 -- only Nigeria's is younger -- and they have played an average of 19 senior international matches. Only one player (Gary Cahill) has played over 40 matches for England; 17 are playing in their first world cup and 8 are playing in their first major international tournament. France have also selected a relatively young squad, although it does contain five players with at least 50 international caps.
Does lack of experience actually matter when it comes to performance? Looking back the last six World Cup tournaments, only two of the twelve finalists had an average age below 26.5: the 2014 champions, Germany (26.2) and the 2010 Champions, Spain (26.4). However, both squads also contained experience, averaging nearly 40 international appearances per player -- twice the current England squad.
Of course, players need to be selected to gain experience, and besides, after the poor performances in 2014 and Euro 2016, it makes sense to look to the future. So perhaps this tournament is part of the learning curve for many of the current crop of players.
Foreign Playing Experience
Another interesting feature of England's squad is their near total lack of foreign playing experience at club level. The chart below shows the number of players in each squad that have played for clubs in more than one country during their career. The major European leagues -- that is, those that tend to attract (and retain) the best players -- are highlighted in red.
Figure 2: The number of players in each World Cup squad to have played for clubs in more than one country.
With so many of the world's best players on show, you'd expect a lot of them to have played in Europe at some point in their career. For that reason, it's not surprising that 549 of the 736 players -- nearly 75% -- have played for clubs in at least two different countries. This also applies to the majority of the French and Portuguese squads, and nearly half the German and Spanish squads.
Only three squads have fewer than 10 players to have played in more than one country: Russia (5), Saudi Arabia (4) and England (just 1). Every player in the England squad is currently based at an English club, and only one player, Eric Dier, has ever played for a non-British club (Dier grew up in Portugal and played for Sporting Lisbon for several years before joining Spurs). This is a fairly standard feature of England's squads over the last few tournaments; in 2014 no players had experience playing abroad, and in 2010 there was only one: Peter Crouch (early in his career he played in the Swedish fourth tier).
Does this matter? Fewer than half the players in the EPL are either British or Irish, so English players clearly have plenty of experience playing with foreigners. However, other aspects of playing abroad may provide important benefits, such as the experience of playing in different atmospheres, climates and cultures. These might be somewhat intangible factors, but perhaps they are part of the explanation of the phenomenon known as home advantage. Maybe it is no surprise that the last time England made any notable progress in a major international tournament was at Euro 96, in England.
We can speculate about whether foreign playing experience affects performance in major international tournaments, but I doubt we can quantify it. However, it's encouraging that a number of young English players are now looking to gain experience abroad, and I hope others do likewise.
Thanks to Simon Gleave for pointing out an error in an earlier version.
 I'm not counting playing in Scotland as 'foreign experience'.
eightyfivepoints.blogspot.com | Fri, 11 May 2018 03:19:00 +0000
On Thursday 14th June, at 6pm local time, the 21st World Cup will kick off in Moscow; the first to be held in Russia, and only the second in Asia. Thirty-one days later, two of the thirty-two participants will contest the final.
There are sure to be shocks and surprises along the way, but what should we expect from the tournament before the first ball has been kicked? Who are the favourites? How likely are we to have a European, South American, Asian or African winner (or perhaps a first-time winner)? Which is the toughest group, or the easiest? How far is each side likely to get?
Based on a simple model for predicting match results, I've simulated the World Cup 10,000 times to evaluate the likelihood of various outcomes and investigate some of the quirks of the tournament. If you're interested in the technical details, scroll down to the Appendix. As the tournament plays out, I'll be rerunning and updating my predictions: follow me on Twitter (
Figure 1. Probability of wining the 2018 World Cup for each of the top 16 favourites, based on 10,000 simulations of the tournament.
Despite the fact that, between them, Germany and Brazil have won nearly half of all the previous tournaments, the model predicts there to be a 67% chance that a different country will win, and a 36% chance that we'll have a first-time winner. The chasing pack include Spain (9% chance), Argentina (8%) and France (7%). After that come Belgium, Portugal and Colombia, each with a 5% chance and all three chasing their first world cup victory. England have a 3% chance of ending a half century of hurt. Hosts Russia, meanwhile, win only 1% of the simulated tournaments.
Aggregating by continent, South or Central American countries win 39% of the simulated tournaments, European countries 55% and Asian/Australia 4%. The model predicts only a 2% chance of an African country winning the World Cup for the first time.
The Group Stage
Let's rewind back to the start of the tournament and take a more detailed look at the group stage. Figure 2 shows the probabilities of each country finishing in a given position in their group, from first to fourth, according to the simulations. The numbers down the right-hand side of each group table (labelled Qual) indicate the probability of the team qualifying for the round of 16, the first knock-out round. Only the top two teams in each group qualify for the round of 16, with the winner playing the runner-up of the neighbouring group (e.g., the winner of Group A will play the runner-up in Group B).
Figure 2. Probability of each country finishing in each position in their group table, from 1st to 4th. The Qual column indicates the probability of the country finishing in either first or second position (and therefore qualifying for the knock-out stage). The winners of each group will play the runner-up of the neighbouring group in the round of 16 (e.g., the winner of Group A plays the runner-up in Group B). Figures may not sum due to rounding.
The most evenly matched group is Group H (Colombia, Poland, Japan and Senegal). Colombia and Poland are the favourites to progress, but both Japan and Senegal have about a 33% chance of qualifying for the knock-out stage. In 63% of tournament simulations, at least one of Colombia and Poland fail to qualify for the round of 16.
Group G (England, Belgium, Panama and Tunisia) appears to be the least competitive group, with the two European countries clear favourites to finish in the top 2 places. In only 42% of simulations do one of them fail to qualify for the knock-out stage.
If both Germany and Brazil finish in the same position (1st or 2nd) in their respective groups, they avoid each other until the final. However, in 30% of my simulations they do meet in what would be a momentous round of 16 tie.
How far will each country get?
Figure 3 provides a more comprehensive picture of how far the model thinks each country is likely to progress in the tournament. It shows the probability of each country reaching a given round, from the round of 16, the quarter-finals, semi-finals and final, to winning the tournament outright. For example, Germany make it to the round of 16 in 84% of simulated tournaments, the quarter-finals in 56%, the semi-finals in 40%, the finals in 26% and win the tournament outright in 16%.
Figure 3. The probability (%) of reaching a given stage of the World Cup, from the round of 16 to winning the final, for each participating country. Figures may not sum due to rounding.
The model predicts that the hosts, Russia, have a 62% chance of making it to the round of 16, but their chances of progression thereafter are fairly low. They make it to the quarter-finals in 23% of simulations, and to the semis in less than 10%.
England benefit not only from being drawn in the easiest group, but also a relatively generous potential round of 16 tie against a team from Group H (most likely either Colombia or Poland). This gives them a 42% chance of reaching the quarter-finals, at which point they typically run into Brazil or Germany and get knocked out. Belgium are actually expected to progress further into the tournament than England, despite the England having a higher probability of winning Group H: once we get into the knock-out stages, England's poor historical performance in penalty shootouts makes Belgium the stronger of the two.
Other Interesting Questions
Whom does the draw favour?
The draw does seem to favour some countries. Portugal, Spain, England and Belgium all benefit from avoiding one of the top-5 favourites in their potential round of 16 opponents. However, the advantage gained is small: the probability of each of these teams reaching the final is increased by about 1% relative to completely randomised tournament draws (using the same seedings).
Who might be the surprise package?
The World Cup winner typically comes from one of the pre-tournament favourites; however, at least one of the semi-final teams tends to be a surprise. In 2014, unfancied Holland took Argentina to penalties; Uruguay made it to the semis in 2010, and both South Korea and Turkey made it that far in 2002. In 83% of my simulations at least one country from outside the top-10 tournament favourites shown in Figure 1 makes it to the semi-finals.
If I had to name one team to make surprising progress in the tournament, I would go with Colombia. Thanks to a relatively generous draw, the model estimates that they have a 20% chance of making it to their first World Cup semi-final.
What is England's most likely route to the final?
England make it to the World Cup final in 8% of simulations. Their most frequently occurring paths to the final typically involve defeating Poland or Colombia in the round of 16, Brazil or Germany in the quarter-finals and one of France, Portugal or Spain in the semi-finals. So the route to the quarter-finals looks reasonable, but England are likely to then play one of the best two teams in the world.
Appendix: Simulation Methodology
The core of the model is the method for simulating match outcomes. The number of goals scored by each team in a match is drawn from a Poisson distribution with the mean, μ, given by a simple linear model:
logμ = β0 + β1X1 + β2X2
There are two predictive variables in the model: X1 = ΔElo/100, the difference between that Country's Elo score and their opponents', and X2is a binary home-advantage indictor equal to one if the team is the host nation (i.e. Russia) and zero otherwise. Note that Elo scores are explicitly designed to be predictive of match outcomes. The initial Elo score for each team is taken from EloRatings, (using the average of the last year, rather than their latest score). The method does not use any information on individual players.
The beta coefficients are determined via linear regression using all World Cup matches since 1990, obtaining values β0 = 0.16 +- 0.03, β1 = 0.17 +- 0.02 and β2 = 0.18 +- 0.09. All are significant, as is the change in deviance relative to an intercept-only model. Home advantage is equivalent to about 100 in Elo score difference, which equates to a boost of 0.2 goals per game.
Running the regression back to 1954 obtains similar results, with the exception of the home advantage coefficient, which becomes significantly larger. Indeed, there is evidence that home advantage is a declining factor in the World Cup (as it is in club competitions). I have also investigated other indicators, such as distance travelled to the tournament, but did not find them to be statistically significant predictors.
Simulations are run 'hot', which means that the Elo scores are updated after each simulated match (using the procedure described here). This has the effect of propagating the impact of a result to future matches, whilst adding a little more variation in the tournaments outcomes by slightly increasing the probability that the weaker teams will progress further into the tournament.
If a match ends in a draw in the knock-out rounds, penalty shoot-outs are simulated, shot-by-shot. Each team is assigned a penalty 'strength': the probability that they score each penalty. This is determined based on their performance in previous World Cup penalty shoot-outs combined with a beta-distributed prior around the historical average (73%).
I simulated the tournament 10,000 times, evaluating the outcomes of the group stage, and subsequent knock-out rounds.
All the code for these simulations can be found on github.
eightyfivepoints.blogspot.com | Fri, 16 Mar 2018 10:21:00 +0000
With less than a quarter of the season remaining, most EPL clubs will finish within a couple of places of their current league position. For some clubs this could mean the difference between survival or sinking into the Championship; for others, swimming with the big fish in the Champions League or the smaller shoals of the Europa League. For the rest the consequences are less dramatic.
While fans chew at their finger nails, club owners and directors will be busy assessing the financial consequences of each permutation as they plan for next season. A significant proportion of each club's revenue next year will depend on their final league position in May. The difference between 8th and 10th may not seem particularly important, but how much are these few extra places worth in terms of the cash prizes on offer? More generally, what is the total amount a club can expect to bank as a direct consequence of their final league position?
The Prizes on Offer
The financial rewards associated with a Premier League place come from several different sources. Some are dependent solely on retaining premier league status for another season, the others are determined by relative position within the league. Here is a brief summary of the prizes (
Figure 1: Estimated total prize money associated with finishing in a given position in the EPL this season, broken down by source.
The EPL title comes with a £219m prize, of which only £39m (18%) is the direct reward, or 'merit money' (light blue bars), for finishing first. £79m, just over a third, comes from the equal-share portion of the TV and commercial income (red & grey bars), and another £32m for matches televised live the following season (yellow bars). The remainder comes from the expected windfall of playing in the Champions League next season.
The difference in prize money between the top four is mostly driven by the manner in which UEFA distributes TV income for the Champions League, which is, in part, dependent on league position. The team that finishes fourth can expect to receive about £20m less from UEFA than the champions. However, there is an even bigger gap between fourth and fifth place, highlighting the financial cost of missing out on the Champions League altogether. The club finishing in fifth will receive £30m less than the fourth-placed club, mostly due to the lower expected revenue from the Europa League.
Prize money drops off slowly from 8th to 17th place at a rate of £3m per place: the team that finishes 8th receives £25m more than the team that finishes 17th. Parachute payments and Championship income ensure that relegated teams receive at least £47m next season, but they will still receive less than half the amount received by the team finishing 17th. From a purely financial perspective, owners of clubs outside the Big 6 may not care where they finish in the league, so long as it's not in the bottom three places.
Should Champions League revenue be distributed more broadly?
Relegation aside, the Champions League represents the biggest distortion of the rewards on offer, creating an artificial gulf between fourth and fifth place. Under the current system, EPL clubs that qualify for the Champions League each receive a portion of the English share of UEFA's broadcasting revenue (a total of £165m this season) which itself is determined by the size of the UK audience. But the Champions League is watched by fans of all clubs; it is, after all, the pinnacle of European football. Should UEFA's TV income belong solely to the clubs that participate in the tournament?
In the last fifteen seasons, clubs outside the big six have qualified for a place in the ECL on only two occasions: Leicester in 2016 and Everton in 2005. Otherwise, the two Manchester teams, Liverpool, Chelsea, Arsenal and, more recently, Spurs have gobbled up the lions share of the EPL's revenue from one of sport's most moneyed tournaments. The situation is even more extreme in Scotland, where Celtic have been the sole beneficiary for the last six seasons.
A redistribution of UEFA's broadcasting revenue to ensure that all EPL (and SPL) clubs receive a taste -- it doesn't have to be an equal portion -- could be an effective means of narrowing the financial schism between the top clubs and the rest of the league. It may, in the long run, make for a more consistently competitive league.
Thanks to David Shaw for comments.
 Amusingly, this means that it is financially beneficially to be the sole remaining representative of a domestic league, as early as possible in a European tournament.
 UEFA state: "A new four-pillar financial distribution system (starting fee, performance in the competition, individual club coefficient and market pool) will see sporting performances better rewarded, while market pool share will decrease." I can't find any information as to what this new system will be, though.
 League position in the previous season is actually a good indicator of the number of matches a club can expect to be shown live the following season.
 Note that Europa League revenue is less for the 6th and 7th place due to the decreased likelihood of these teams qualifying for the league stage (based on past performances of English clubs in the qualifiers).
 Figure 1 probably underestimates this gap as club-specific sponsorship deals often increase in value with participation in the ECL (or decrease should a team fail to qualify).
 UEFA does make 'solidarity payments' to professional clubs that did not participate in their competitions, but these tend to be insignificantly small relative to the other EPL prizes.
eightyfivepoints.blogspot.com | Fri, 02 Feb 2018 12:49:00 +0000
The UK has long been a country that imports more goods than it exports.
Figure 1: Total transfer fees paid (imports) or received (exports) by English clubs for players bought or sold to clubs abroad since the 2012/13 season. Over the last six seasons English clubs have generated a €3.2 billion deficit in overseas player transfers.
How does England’s transfer trade deficit compare with other countries? Figure 2 gives the total value of players imported and exported for each of the major European leagues, the smaller European leagues ("rest of Europe") and the rest of the world. The bottom line indicates the net trading surplus (or deficit) for each country or region.
The accumulated English transfer deficit since 2012 is nearly fifteen times greater than that of German clubs, who have run up a deficit of €0.2 billion over the same period. Italian clubs are effectively in balance, spending as much as they receive, while the Spanish, French, Dutch and Portuguese leagues have produced a net transfer surplus. Dutch and Portuguese clubs in particular have been running a profitable trade in player exports, generating €0.6bn and €1bn respectively from player exports while paying significantly less for imported players.
Figure 2: Total spent, in €billion, on imported players (columns) or received for exported players (rows) for each of Europe's top leagues. Rest of Europe (RoE) encompasses all other European leagues; rest of world (RoW) encompasses all non-European clubs.
Since 2012, nearly a third of Spanish overseas transfer spending has gone to English clubs, with about a sixth going to their Portuguese neighbours, and a tenth to South American clubs. Real Madrid and Barcelona account for €1bn, over half the value of all Spanish imports. However, they also generated in €0.8bn in exported player sales, with Real Madrid – somewhat surprisingly – almost breaking even. The remaining Spanish clubs produced a net surplus of €0.3bn.
France has a €0.2bn net trading deficit with Spain, although that could be put down to a single transfer: the €220m paid by PSG to Barcelona last summer for Neymar. Indeed, PSG have spent a total of €0.8bn on imported players in the last six seasons, accounting for nearly two-thirds of the total value of all players imported to France. However, unlike Real Madrid and Barcelona, PSG’s player sales abroad were only €0.2bn. If PSG were excluded, France would have the largest trade surplus of any major European league.
German clubs have spent a total of €1.2bn on imported players, with Bayern Munich accounting for a quarter of the value of players imported and just under a sixth of exports. German clubs are the biggest importers of players from smaller European leagues, bringing in nearly €200m worth of players from Switzerland and the Ukraine alone over the last 6 years. Italian clubs recruit broadly from across Europe and South America, but principally sell their players to clubs in Italy, France, Spain and Germany.
Portugal and Holland have both profited from the international transfer market. Dutch clubs have imported only €40m of players in the last six years, while exporting €600m: a testimony to the quality of Dutch youth development. Portuguese clubs spent just over €100m importing players from Spain and South and Central America since 2012, but have received over £1bn from selling players abroad. A significant fraction of the exported players were not Portuguese nationals, indicating the success of Portuguese clubs in identifying and developing foreign talent.
Are English clubs being held to ransom?
The most recent publication of Deloitte’s Football Money League, which ranks clubs by their total revenue, emphasizes the wealth of the premier league. There are five English clubs in the top 10, and ten in the top 20. With their financial fire power so clearly evident, are Premier League clubs being milked by foreign clubs looking to sell their players at premium prices?
This is difficult to assess, but there is some evidence. The median transfer fee paid by EPL teams (for transfers in which there was a fee) has more than doubled over the last eight years, whereas transfer price inflation has been far more sedate in other countries. So unless the quality players bought has also increased, it would appear that transfer market prices are rising faster for English clubs than those in other countries.
EPL clubs spent nearly £400m last month, smashing the previous record for the January transfer window. With fierce competition at both ends of the table, EPL clubs are willing to spend heavily to survive and succeed. Managers may feel they must “use it or lose it” when it comes to the funds at their disposal, with “lose it” often also referring to their jobs. However – as a number of analyses have shown – emergency mid-season spending does not improve results. Given the overcooked prices, and the fairly low success rate of new players, perhaps club owners and directors would be better off squirrelling away some of their capital away for a rainy day.
eightyfivepoints.blogspot.com | Mon, 15 Jan 2018 03:40:00 +0000
David De Gea won the match for Man United at the Emirates last month. Under siege for much of the second half, de Gea’s brilliant performance preserved United’s lead at a crucial stage of the match. He made a total of 14 saves in the match, equaling the EPL record set by Tim Krul for Newcastle in 2013.
How many goals does a high-quality goalkeeper like de Gea save his team over the course of a season? Pundits frequently claim that a goalkeeper is worth so many goals or points, but this has never actually been measured. In this post I take advantage of a novel feature of Stratagem’s dataset to infer the number of goals that EPL goalkeepers have saved or cost their teams in the last two seasons. Who is bailing out their defence, and who is letting them down? Would United be a top-4 team without de Gea? Is either of Mignolet and Karius a decent shot-stopper?
The difficulty of a save
When a goalkeeper fails to make a routine save it is described as an error, costing the team a goal. But when he makes a superb fingertip stop he is thought to have saved his team a goal. To measure a goalkeeper’s shot-stopping ability, we must assess the difficulty of the shots that he faced.
A nice feature of Stratagem’s dataset is that it includes a subjective assessment of the quality of every shot, along with a record of the outcome (missed, saved, intercepted or goal). Each shot on goal is allocated a shot quality rating on a scale of 1-5, where a '1' represents a wild, scuffed or poorly hit shot (e.g. Lukaku’s recent overhead kick attempt against Leicester), and '5' an excellent shot that gave the keeper next to no chance (such as Alonso’s free kick goal against Spurs, or Ozil’s volleyed goal against Newcastle). A standard shot – one that tests the goalkeeper but that he should have a reasonable chance of saving – is rated as a '3' (think Firmino’s equalizer against Arsenal last month).
However, we can’t solely assess goalkeepers on the quality of the shots they face: the quality of the chance – before the shot was taken – must also be taken into account. After all, a shot from one yard out is likely to result in a goal irrespective of how well it is hit. Fortunately, this is exactly what expected goals (ExpG) measures.
I constructed a simple model to assess the probability of a goalkeeper making a save given the quality of the chance and the quality of the shot; details of the model are given in the appendix. For shot quality I used Stratagem’s rating, for chance quality I used an
Figure 1: The proportion of shots -- on target, and not blocked by an intervening outfield player -- saved as a function of Stratagem’s shot quality rating (controlling for chance quality by fixing the ExpG of the chance to 0.1). The blue, red and black lines show the proportion of shots saved by Heurelho Gomes, Artur Boruc and David de Gea in the 2016/17 season. The blue shaded region shows the average over all EPL goalkeepers that season.
The average EPL goalkeeper saved nearly 80% of standard shots (rated as a ‘3’) last season, and nearly 99% of poor shots (rated as a '1'). However, he saved only 65% of well-hit shots ('4'), and only 8% of excellent shots ('5'). Artur Boruc’s save ratio was almost identical to the EPL average, but David de Gea’s was clearly superior: while the average EPL keeper saved under 80% of standard shots, de Gea saved more than 90%. He did even better with shots scored as a ‘4’, saving 20% more than the average goalkeeper. However, Heurelho Gomes cost Watford goals last season. Although he was reliable in making routine saves (shots rated as '1' or '2'), he saved a significantly lower proportion of shots rated '3' or higher.
How many goals are goalkeepers worth?
Figure 2 shows the number of goals that each EPL goalkeeper saved or cost their team in the 2016/17 season (blue bars), and in the 2017/18 season (up to gameweek 22, red bars). It is calculated by comparing the actual number of goals conceded by each goalkeeper to the number of goals that the (hypothetical) average goalkeeper would have conceded from the same shots, based on the model described above. A negative total indicates that the player saved their team goals relative to the average goalkeeper; a positive total indicates that they cost their team goals. Only players that played at least 14 matches in either season are plotted.
Figure 2: The number of goals that each EPL goalkeeper saved (-ve) or cost (+ve) his team over the course of the 2016/17 season (blue bars) and 2017/18 season (up to gameweek 22, red bars), calculated by comparing the actual number of goals they conceded to the number the average goalkeeper would have conceded from the same shots. Only goalkeepers that played at least 14 league matches in either season are shown.
David de Gea has been the best performing goalkeeper in the last two seasons, saving Man United around 8 goals last season and nearly 10 so far this season. Without him it is highly likely that United would not currently be in the top four. Burnley goalkeepers Tom Heaton (16/17) and Nick Pope (17/18) have both performed extremely well, too: is this somehow a consequence of the way Burnley play, did they both happen to be in very good form, or are they genuinely very good goalkeepers? It will be interesting to follow their respective careers over the next few seasons.
Petr Cech and Hugo Lloris both saved their teams a significant number of goals last season, but they appear to be underperforming this time around. I wonder if Arsenal and Spurs fans would agree with this assessment? Similarly, would Swansea fans attest to a significant improvement in Fabianksi this season?
Simon Mignolet has been decidedly average. With the exception of Claudio Bravo – who was replaced in the summer – he is the worst performing goalkeeper at a top-6 team over the last 18 months. Having confirmed that Loris Karius is now the first choice goalkeeper at Anfield, Jurgen Klopp finally appears to have lost patience with Mignolet. However, Karius also looks vulnerable, and his stats are not great either. Purchasing an elite goalkeeper should be very high on Liverpool's to-do list.
Heurelho Gomes has been the most costly goalkeeper over the last season and a half, conceding nearly 12 goals more than the average goalkeeper last season, and 5 more this season. I haven’t noticed Watford fans complaining about him though, so it would be interesting to know what they think of this. The only other goalkeepers to have cost their team more than five goals in a season are Joe Hart and Wayne Hennessey.
Of course, comparing goalkeepers by the saves they make neglects other important aspects of their job. Collecting crosses and accurate distribution are also crucial parts of their skillset. However, as with a striker’s link play, or ability to create space, these actions are much harder to quantify.
Are Goalkeepers undervalued in the transfer market?
Relative to the expected performance of the average EPL goalkeeper, David de Gea has saved Man United nearly 10 goals so far this season. That's equal to the total number of goals that Lukaku and Morata have scored, which is not even a fair comparison: we should really compare de Gea with the number of goals each striker has scored relative to the number an average striker would have scored in their place. By that metric, de Gea could well be the highest-performing player this season – in any position.
Yet top goalkeepers appear to be priced significantly lower than their teammates in the transfer market. David de Gea does not feature in the top 100 players by transfer value calculated by the CIES Football Observatory. Last season he was ranked 58, six places below Jamie Vardy. Only two goalkeepers made it into the top-50 (Courtois and Oblak), and only ter Stegen does this year. Just one goalkeeper appears in the list of the top-50 most expensive transfers of all time. Even in the Fantasy Premier League, de Gea has accumulated more points than any striker bar Kane this season, yet is worth less than half his value in the game.
Goalkeepers play a unique and highly specialized role in a team, yet they are consistently undervalued. This could be one of the largest inefficiencies in football. As my evidence shows, de Gea could turn out to be the decisive factor in Manchester United securing Champions League next season. Conversely, a lack of an elite goalkeeper could see Liverpool falling short.
Many thanks to David Shaw for reading and commenting on multiple version of this post.
The logistic model for save probability is defined as:
SaveProbability = 1 - 1 / ( 1 + exp(-y) ) ,
y = β0 + β1 * ChanceQuality + β2 * ShotQuality .
ChanceQuality is the ExpG score for the opportunity, determined using the ‘extended’ model described here. ShotQuality is Stratagem’s shot quality rating for the shot, rated on a scale of 1 (poor) to 5 (excellent).
The β coefficients were determined using logistic regression. For individual goalkeepers they were obtained using the shots on target faced in a given season; for the average EPL goalkeeper all shots faced by goalkeepers that played at least 14 matches were used. The coefficients for the average goalkeeper in the 2016/17 season are β0=-7.56±0.30, β1=-8.62±0.41 and β2=1.81±0.09
eightyfivepoints.blogspot.com | Fri, 05 Jan 2018 13:16:00 +0000
Earlier this week West Brom made an
What about longer recovery periods? I repeated the analysis looking at matches between teams that had had two full days of rest since their previous match (e.g, playing Sunday-Wednesday) against opponents that had had at least 3 days of rest (Saturday-Wednesday). This time I found no evidence of any disadvantage: there was no significant difference between the win percentage in that match and the return game the same season. This implies that two full days of rest is generally sufficient for players to maintain their performance levels in consecutive matches (although presumably that is not sustainable in the long-run).
I’ve provided evidence, using data, that when a team plays their second match in 48 hours against an opponent that had at least 24 hours more rest, they are at a small but significant disadvantage. If the Premier League and FA want to ensure that the fixture schedule does not favour some clubs they should either avoid making teams play twice in 48 hours, or – if it that is not possible – schedule the match so that both teams have exactly the same amount of time to recover from their previous match.
Of course, if the Premier League's priority to is to please the companies that pay to broadcast matches live, all this will be very much a secondary concern.
From blogs to the BBC, the concept of ‘expected goals’ has entered the mainstream media's lexicon. It has caught on because it’s a useful concept; it’s a useful concept because football is a low-scoring game. Chance (or luck) can be the difference between victory and defeat, a good day or an off-day. Expected goals, however, measures what would have happened on an average day.
It’s a simple quantity to measure. Shots are assigned a number between the 0 or 1: the proportion of similar shots (from the same position, for example) that have resulted in a goal. I’ll refer to this as ‘chance quality’. You then add up the chance quality for every shot taken by a team in a match to calculate the number of goals that you would have ‘expected’ them to score that day.
One problem with typical measures of chance quality is that they tend to ignore the other team. How many of their defenders were between the shot-taker and the goal? How effectively were they denying him the time and space to take a clean shot? There is a huge difference between rolling the ball towards an undefended net, and fighting off four defenders while stretching to get a toenail on a cross.
Fortunately, thanks to
Figure 1: Chance Quality in the basic model. The coloured zones indicate the probability of a (non-headed, open play) shot producing a goal as a function of distance and angle to goal. Shots taken in the outermost zone (zone 1: cyan) have a 5-10% probability of producing a goal; those taken within the innermost zone (red) have at least a 50% chance.
This is a fairly standard model for chance quality, however it tells us nothing about the influence of the defending team, just the average probability of scoring from a given position. The next step is to investigate how chance quality varies as we incorporate Stratagem’s defensive metrics.
Incorporating Defense Metrics
As mentioned above, Stratagem provide two indicators; ‘numDefPlayers’: the number of defensive players (including the goalkeeper) in a direct line from the shot position to the goal, and ‘defPressure’: the defensive pressure exerted on the shot-taker, rated on a scale between 0-5. Stratagem describe their pressure scoring system in the following way:
I extended the basic chance quality model to include these two defensive metrics. The resulting regression found both defensive coefficients to be highly significant and improved the loglikelihood of the model substantially relative to the basic model (see Appendix 1). Other metrics also indicate an improvement in the model fit, with an increase in the average correct prediction probability and a decreasing Brier score. I have no doubt that incorporating the defensive metrics significantly improved the model.
But what does the new information imply in a practical sense? Loosely speaking, every three extra intervening defenders halves the probability of a goal. An unopposed (non-headed) shot from twelve yards in open play will result in a goal for 42% of shots (2 in 5); placing three defenders between the goal and shot-taker reduces this to 24%. Increasing defensive pressure from none to high pressure (a score of 4 in Stratagem's system) has approximately the same impact.
Figure 2 demonstrates graphically how chance quality is dependent on the defensive indicators. It shows four scenarios: top-left, shots taken under light defensive pressure and only one intervening defensive player (normally the goalkeeper); top-right, low defensive pressure but four intervening defensive players; bottom-left, high pressure with two intervening defensive players; and bottom-right, high pressure and four intervening defenders. In each case I provide an example chance from the 2016/17 season; all four shots were selected to be in roughly the same position. The zones indicate chance quality and are coloured in the same way as in Figure 1: 5-10% (cyan), 10-20% (magenta), 20-30% green, 30-40% yellow and >=50% (red).
Figure 2: The impact of defence on chance quality. Each panel demonstrates how the probability of a shot producing a goal is dependent on the number of intervening defenders and the degree of physical pressure exerted on the shot-taker. The coloured zones indicate the probability of a (non-headed, open play) shot producing a goal as a function of distance and angle to goal (cyan: 5-10%, magenta: 10-20%, green 20-30%, yellow: 30-40%, grey: 40-50% and red: >50%).
The four scenarios clearly demonstrate how the probability of a goal is dependent on the defence. When a player has a clear, unimpeded sight of goal (top-left in Figure 2) the probability of scoring from any reasonable distance increases significantly. Zones 1 and 2 (cyan and magenta contours) extend out well beyond the penalty area, and zone 6 (the innermost zone, indicating a >50% chance of a goal) extends beyond the six-yard box. Shots near the penalty spot – such as Benik Afobe’s chance for Bournemouth in the 61st minute against Southampton in April – have a one-in-three chance of producing a goal, on average.
At the other extreme (bottom right), a highly pressurized and obstructed shot on goal, chance quality is significantly reduced. Shots around the penalty spot – such as Nacer Chadli’s shot in the 70th minute of West Brom’s match at Liverpool last season – have a 1 in 10 chance of producing a goal: three times lower than the unimpeded case. Zone 1 (the outermost zone) does not extend beyond the penalty area, and the probability of scoring anywhere outside the six-yard box is less than 25%. This example clearly demonstrates the difficulty of playing against a packed defense.
In this post I’ve demonstrated that the information recorded by Stratagem on the positions and actions of the defending team is important for assessing chance quality. I’ve constructed a chance quality model that incorporates both the number of intervening defenders and the amount of pressure being applied to the attacking player, demonstrating that it out-performs models based on shot position and type alone. The availability of this data is clearly of huge benefit to the analytics community.
The results quantify the difficulty of scoring against a packed defense. The probability of scoring a shot from outside a crowded penalty area is less than 5%, essentially wasting possession. A well-organised, ultra-defensive team can reduce the probability of scoring from outside the six-yard box to less than 20%, or one in five. In that situation, the ability to maneuver defenders out of position is clearly very important.
It’s worth pointing out that, over the course of the 2016/17 season, the variance between defensive situations averages out and so the expected goals scored or conceded for each EPL team is not significantly different relative to position-only models. The largest change is Liverpool’s expected goals against, which increases from 0.96 goals per game in the basic model to 1.03: that’s 36 to 39 goals in total over the season (they actually conceded 42). This implies that Liverpool’s defence allowed their opponents slightly better chances than a shot position-only model would suggest.
Thanks to David Shaw for comments.
This article was written with the aid of StrataData, which is property of
Like stock markets in the late nineties, the English transfer market is soaring ever higher. In 2010, EPL clubs spent nearly £500m; in 2014 they broke the £1bn mark and this season total spending should exceed £1.5bn. While the number of transfers each season has
Figure 1: Inflation in English transfer fees -- the Transfer Price Index (TPI, black line) -- as measured by Graeme Riley and Paul Tomkins. The grey line/right axis shows the income raised through the sale of EPL TV broadcasting rights in successive deals. Red dots show selected transfers at their original cash value: all would be valued around £59m in today's market.
We can apply the TPI to calculate the present market value of any English transfer dating back to 1992. To demonstrate the point, I’ve plotted various transfers from the past 25 years and their cash value at the time of the transfer in Figure 1. It may appear to be an unusual selection of players but they have one thing in common: according to the TPI their transfer fee in the current market would be around £59m. To answer the question posed above: the £3.6m spent on Alan Shearer in 1992 is equivalent to £68m in today's prices.
What causes transfer fees to increase so rapidly? TV money has a lot to do with it. The grey line in Figure 1 indicates the
Figure 2: The cumulative net spending of the seven biggest EPL clubs since the 2000/01 season at present market value (i.e, adjusted for TPI inflation).
Since Abramovich bought the club, Chelsea have invested nearly £2.5bn in players (again: in current prices and net of sales). In his first year, Chelsea spent £617m in today’s prices (£113m at the time) and finished 2nd, behind Arsenal. In his second year they spent £465m (£92m) and won the title. So it cost Chelsea more than £1bn to buy the title. They finished as Champions in the following two seasons, spending over £300m in today’s prices to do so. Since 2003/04 Chelsea have won a total of five titles: that’s about £480m per title in the current market.
Similarly, it cost the Abu Dhabi Group £1bn to buy their first title for Man City, four years after they bought the club. They have spent a further £650m since, yielding one further title. So that’s around £825m per title. Over the same time period (since 2008), Man United have won three titles at a cost of about £260m per title in today’s prices. If we go back to 2000/01, this decreases to about £190m per title. Of course, in 2000/01 United already had the best team and one of football’s greatest managers.
In the decade that followed their unbeaten, title-winning season, Arsenal’s cumulative net spending declined as sales consistently exceeded spending. Despite the recent uptick, the long period of under-investment clearly hindered their prospects of reclaiming the title and it’s possible that the damage is still being felt today. While Man City may have overpaid for their two titles, Arsenal have simply not spent enough to mount title challenges year after year.
Are Spurs the counter-example? Over the last seven seasons they have been relatively parsimonious, with transfer revenue matching expenditure. On the other hand, they have not actually won anything, nor really come close.
The £113m (non-inflated) Chelsea spent in 2003/04 was a completely unprecedented amount at the time, greatly exceeding what any other club had ever spent previously. However, in today’s market, it’s no longer a surprising amount. It’s only when you calculate that £113m in 2003 is equivalent to spending £617m today that you realize how much Chelsea rocked the EPL boat. To correctly compare transfer fees and club spending over an extended period of time it is necessary to convert transfer fees to a consistent basis. Graeme Riley and Paul Tomkins’ TPI index provides the necessary tool to do this.
At the beginning I asked if Neymar is still the world’s most expensive footballer once we account for inflation? The answer is no. It cost Real Madrid £80m to sign Cristiano Ronaldo from Man United in 2009; that is equivalent to £232m in today’s prices. On a comparable basis, Ronaldo was more expensive[4,5].
Thanks to Paul Tomkins for providing me with the latest Transfer Price Index and transfer data, and to Paul Tomkins, David Shaw and Omar Chaudhuri for comments.
 NB: I am not suggesting that the players plotted were of similar ability; I’ve simply picked transfers that have a present market value of £59m according to TPI.  To be precise, by 'current market' I mean prices at the end of the 2016/17 season, before the summer transfer window opened. TPI has not yet been updated for 2017/18 season transfers.
 Transfer data kindly provided by Paul Tomkins.
 If you’re willing to apply TPI (which is measured using English transfers) to transfers between foreign clubs, then Ronaldo is not the most expensive signing ever. That honour belongs to Gianluigi Lentini, who was signed by A.C. Milan from Torino for £13m in 1992. That’s £244m in today’s prices!  Based on the continuation of recent trends, I would expect the next record transfer (in non-inflated terms) to occur in a couple of season's time, cost about £250m and be a player who is currently about 22/23 years old.
eightyfivepoints.blogspot.com | Sat, 19 Aug 2017 14:08:00 +0000
Figure 1: The joint distribution of Lawrenson’s and Merson’s predictions since 2014/15.
The rows indicate Lawrenson’s predictions (top row: home win, middle: away win and bottom: draw) and the columns Merson’s predictions (left: home win, middle: away win and right: draw); each cell therefore represents a pair of predictions. The numbers in each cell give the proportion of the 1101 matches in which the pundits made those predictions. For example, the top-middle cell represents those matches in which Lawrenson predicted a home win but Merson predicted an away win. It shows that this particular combination of predictions occurred in only 3% (37) of the 1101 matches.
The shaded cells down the diagonal represent the matches in which the pundits predicted the same outcome. In 41% (450) of the 1101 EPL matches, Lawrenson and Merson both predicted a home win; in 12% (134) they both predicted an away win; and in 10% (111) they both predicted a draw. In total, the pundits made the same prediction for 695 of the 1101 matches, so Lawrenson and Merson agree nearly two-thirds of the time. The remainder of this post focuses on these matches specifically, which henceforth I’ll refer to as the consensus forecasts.
How accurate are the consensus forecasts?
Table 1 shows the prediction accuracy – the proportion of their predictions that were correct – for Merson and Lawrenson individually, and for the consensus forecasts (i.e., when they agreed).
Table 1: Proportion of correct predictions for the pundits individually, and combined (‘consensus’).
Individually, both pundits have a success rate of just over 50%: they predicted the outcome correctly in a little over half of all EPL matches. Breaking this down, we see that their home win predictions have been correct in 60% of matches; away win predictions in around 56% of matches; and their draw predictions in 33%. As discussed in my previous post, draws are hard to predict and a 33% success rate is actually pretty good.
However, when both pundits agree – the consensus forecasts – their overall success rate increases by nearly 10%, from just over 50% to 60%. Specifically, their home win prediction success increases by a few percentage points to 62%; their away win prediction success by around 8 points to 64%; and their draw predictions by 10 points from 34% to 44%. That’s a significant improvement, particularly for draws.
Aren't the consensus forecasts just matches that were easier to predict?
The obvious criticism is that the pundits may just agree on the matches that are easier to predict and the improvement in prediction success is just a result of that. This is partly the case, at least for home and away wins. The average odds offered on these outcomes for the consensus forecasts tend to be about 10-15% lower (i.e., shorter) than those offered on the individual pundits' home and away win predictions. So the market does view these matches as being easier to predict.
However, there is almost no difference in the odds offered on draws between the consensus predictions and the pundit’s individual predictions. As I demonstrated last time, the betting market is actually quite poor at predicting draws (in particular, see
Table 2: Average betting return for the pundits individually, and for their combined ‘consensus’ forecast. NB: numbers for Lawrenson are different to Table 3 in my previous post as I’m using predictions from 14/15 onwards; in the last post I used all his predictions since 11/12.
Lawrenson and Merson individually made an 8% profit, on average. This increases to 12% when we consider only those matches that they agreed on (the consensus predictions): a substantial increase in return. Breaking this down, we see that the increase in profit in the consensus forecast is driven by the draw predictions. Betting on Lawrenson's or Merson’s draw predictions individually has yielded about a 20% profit per match, on average; this increases to 53%(!) when they both predict a draw. As demonstrated above, the odds offered on draws for the consensus forecasts do not differ significantly to the odds offered on draws for the individual pundit predictions, so this whopping increase is entirely generated by the increase in prediction success rate from 33 to 44%.
For home and away wins, the performance of the consensus forecast is less impressive (relative to the individual pundit forecasts), despite the prediction success rate being higher. As demonstrated above, the odds offered on home or away wins are systematically lower for the consensus forecasts because, when both pundits predicted a winner, they tended to agree for the matches that were easier to predict. So while the consensus forecasts get more matches right, their average profit on the correct predictions is lower.
Comparing betting profits in cash terms for the pundits individually and the consensus forecasts is complicated by the difference in the number of games that you'd be betting on. If you bet £10 on each of Merson's predictions in a single season you would bet a total of £3800 over the 380 matches, with an expected profit of £304 (8%, based on past performance). If you bet the same total amount (£3800) over a season on the consensus forecasts you would bet roughly £16/match over 240 matches (remember: the pundits agree in about 63% of matches), with an expected profit of £460 (12%). Of course, you wouldn't know in advance exactly how many matches the pundits would agree on in that season (although see footnote 1) .
Mark Lawrenson and Paul Merson predict the same outcome in nearly two-thirds of EPL matches; when they agree their prediction accuracy increases by nearly 10%, from a 51% success rate, to a 60% success rate. Betting solely on these matches would, on average, have yielded a 12% profit per match, a significant increase on the 8% return per match yielded by betting on the predictions of either one of the pundits individually. Much of the improvement is driven by their draw predictions: when the pundits both predict a draw – which occurs in about 10% of matches – they are correct 44% of the time, yielding a whopping 53% profit per match, on average. As described at length in my previous post, draws are tough to predict and the betting market is not particularly good at it.
So where do we go from here? If your goal is to maximize returns you could explore various strategies for determining the optimal bet size. That’s not my main objective though. While the betting market provides a useful baseline for evaluating predictions, my principal interest is how we can improve forecasting by combining human predictions with those from statistical models. Of course, a huge amount of work has been done on the latter -- there are several large media outlets (e.g. FiveThirtyEight, Sky and the FT) and a large number of individual analysts (see, for example, here and here) that regularly publish stats-based predictions. However, I’m convinced that incorporating a human component could make significant improvements; I'm open to collaboration to explore hybrid methodologies.
For now, I’ll continue to monitor the performance of the pundit’s predictions and their consensus forecast over the current season. You can find latest results here. And maybe I'll wager a little money myself, too. ☺
 The 695 matches were fairly evenly spread over the three seasons, 227 in 14/15, 216 in 15/16 and 251 in 16/17.
 The Sharpe ratio – the annualized return per unit risk – for both pundits individually is 1.3. It increases to 1.7 for the consensus forecast.
For the last six seasons, ex-Liverpool player and regular BBC pundit Mark (‘Lawro’) Lawrenson has been attempting to predict the outcome of every EPL match. Over on Sky Sports, ex-Arsenal player Paul Merson has been doing the same thing. Their predictions are published on the
Figure 1: The cumulative profit and loss (P&L) generated from betting £1 on each of Mark Lawrenson’s 2240 match predictions over the last 6 EPL seasons. The right hand bar shows the distribution of the P&L obtained if you were to bet randomly over the same period: there is only a 1/500 chance that you would exceed Lawro’s profit.
To my knowledge, neither Lawro nor Merson has professed to adopting any kind of system for making a prediction. Instead they rely on instinct and their vast experience in English football. Each week, they provide a few sentences explaining their predictions, summarizing recent form, injuries, suspensions and the importance of the match to each team. So while they do not have a systematic, data-driven approach to predicting football, they may intuitively incorporate some of the intangible factors that statistical models do not, or perhaps cannot, account for.
In the remainder of this blog I’m going to take a closer look at pundit's forecasting success, and identify how they have been able to beat the market. In a subsequent blog, I’m going to look how we might improve on their predictions.
A more detailed look at the pundit’s forecasts
Table 1 shows the rate at which Lawro and Merson predict home wins, away wins and draws compared to the frequency at which each outcome has occurred in practice. For example, in Mark Lawrenson’s 2240 match forecasts he has predicted a home win in 56%, an away win in 19% and a draw in the remaining 25%. Over the same period, these outcomes have actually occurred at a rate of 45%, 30% and 25%, respectively. So Lawrenson overestimates the frequency of home wins and underestimates the frequency of away wins, but predicts draws at the correct rate. Pretty much the same applies to Merson.
Table 1: Proportion of home win, away win and draw predictions, and the rate they occur in practice.
The final column of Table 1 shows the bookie’s favoured outcome, that is, the outcome with the lowest odds. It’s striking that they never predict a draw as the most likely outcome: in 70% of matches they favour a home win and the remaining 30% an away win.
This isn’t actually all that surprising: draws are difficult to predict. A delicate compromise between a home win and an away win, they are inherently unstable results – a single goal for either team breaks the deadlock. In my experience, statistical forecasting models (which I assume are largely what drives bookmakers odds) have a lot of difficulty in predicting a draw as the most likely outcome; they nearly always assign a higher probability to one team outscoring the other than they do to a tie. This can be seen explicitly in the odds offered by the bookmakers on draws: rarely below 3.0 or greater than 5.0, corresponding to a narrow range in probability of just 20% to 33%.
Table 2 shows the pundit’s success rate: the proportion of their predictions that were correct. Both Lawrenson and Merson have a success rate of 52%, they predict the correct outcome in just over half of the matches. Breaking this down we see that just under 60% of their home win predictions and around 55% of their away win predictions are correct, but only 34% of their draw predictions are right. However, while that may seem low, it does not imply a lack of skill. Only a quarter of all matches end in a draw but the pundits have a higher success rate than that – they are correct a third of the time. Even though their success rate seems low for draws, they are definitely doing better than just randomly predicting them.
Table 2: Proportion of correct predictions for pundits and bookmakers.
By this metric the bookmakers outperform the pundits. Their favoured outcome is realized in 54% of matches, a higher success rate than the pundit’s 52%. However, the bookmakers never favour a draw and so are not penalized by the low success rate that arises from trying to predict them. Furthermore, when you look solely at either home win or away win predictions, both pundits have a higher success rate.
I’m not suggesting that the bookmaker’s odds on draws are wrong. Indeed it’s straightforward to show that their odds are quite well calibrated, as
Table 3: Pundit’s betting returns and the average odds they received for their correct predictions.
This is, of course, driven by the different odds offered on each outcome. The lower part of Table 3 shows the average (decimal) odds that were offered on the pundit’s winning predictions, i.e. those in which they won their bet.
The average odds offered on their correct home and away predictions were 1.8 and 1.9 respectively, which corresponds to an average profit of £0.80 and £0.90 (based on a £1 bet). So, on average, they gain less when they win an individual bet than when they lose one; fortunately, as Table 3 shows, they win them sufficiently often to make a profit.
For their correct draw predictions, they receive average odds of 3.5 – this is a profit of £2.50, which is nearly triple the return on their correct home and away win bets! The markets are underestimating the likelihood of a draw in those games, allowing the pundits to make a decent profit. Infact, the majority of the pundit’s profits are generated by the edge they have over the market in predicting draws.
We can also see this if we look at each pundit's top-20 most profitable predictions. All but one (95%) of Merson’s top-20 are draws, and 14 out of 20 (70%) of Lawrenson’s are draws. Around two-thirds of these matches also involved a top-4 team, typically a smaller team gaining a draw against one of the big fish (such as QPR vs Man City in 2014/15, or Sunderland vs Arsenal in 2015/16).
The take-home message from this blog is that, despite their success rate of only 1 in 3 in predicting them, draws are the most profitable predictions made by both pundits. This is because they are better at predicting draws than the bookmakers. The odds offered by the market rarely differ substantially from the basic rate at which draws occur – about 1 in every 4 matches. I suspect that significant improvements could be made to statistical models for predicting the outcome of football matches by identifying the information the pundits are homing in on when they predict a draw.
So far I’ve treated Lawro and Merson separately. What happens if we construct a consensus forecast, betting on only those matches for which their predictions agree? Is their combined prediction power better than their individual efforts? It turns out the answer is yes, but that’s the topic for my next blog...
Note: This post is an analysis of past pundit predictions and past performance is not indicative of future results. Do not bet with the expectation of making a profit: you may lose significantly more than you win.
 Bet365, BetVictor, Ladbrokes and William Hill
 To generate the null distribution, I reran the full 6-year experiment 10,000 times, randomly assigning a prediction of home win, away win or draw for every game. The rate of home wins, away wins and draws in each season were fixed to be the same as in Lawrenson’s forecasts; I basically just shuffled his forecasts around between matches.
 Especially if the model assumes independent Poisson distributions for each. Bivariate distributions with non-zero correlation may rectify this.
 The inferred percentage probability of an outcome given the odds, is just 100/(decimal odds). You also need correct for the over-round -- the edge the bookmakers give themselves that result in the inferred outcome probabilities summing to greater than 100%.
 If the pundit’s success rate for their home (away) win predictions dropped below 56% (53%) they would make a loss on them.
eightyfivepoints.blogspot.com | Wed, 07 Jun 2017 21:15:00 +0000
One of the key features of Chelsea’s title winning season was the consistency of their starting lineups. After the 3-0 thrashing by Arsenal in September, Conte switched to a 3-4-3 formation and Chelsea embarked on a 13-match winning streak in the league that ultimately propelled them to the title. The foundation of this formation – Luiz, Cahill and Azpilicueta – started each of the next 32 games, and the wing-backs, Moses and Alonso, missed only three between them.
Such consistency is partly due to luck with injuries and suspensions, but Conte also resisted the temptation to tinker with his team. Other managers opted to ring the changes, for tactical purposes or to rest players. In the closing weeks of the season Mourinho was compelled to defend his rotation policy, citing fixture congestion and the need to maximize his chances of Europa League success. However, frequent changes to United’s starting lineup were a feature of their entire season not just the final few months.
In this article I’m going to take a detailed look at how EPL clubs utilized their squads throughout the season. I’ll compare the rate at which managers ‘rotated’ their teams (which I define simply as the number of changes made to their starting lineup) and the number of players they used in doing so. I’ll investigate some of the factors that may have influenced a manager’s decision to fiddle with his lineup. Finally I’ll discuss whether rotation had an impact on results.
Let’s start with a look at squad size and rotation. Figure 1 plots the average number of changes made to the starting lineup against the total number of players used by each EPL club last season.
Clubs on the left-hand side of the plot preferred to maintain the same starting lineup, changing about one player per match. Those plotted towards the right of the plot varied their team more frequently. The vertical axis measures effective squad size – the number of players that started at least one EPL match. Teams that are plotted towards the bottom of the plot picked their lineups from a relatively small group of players, those plotted nearer the top chose them from a larger pool.
Figure 1: Squad rotation (average number of changes made to the starting lineup) versus effective squad size (number of players that started at least one league match) for all EPL clubs in 2016/17. Uses data provided by Stratagem Technologies.
Both quantities plotted in Figure 1 are important. A manager could adopt a highly structured rotation policy in which three players are changed in each match but are chosen from a small squad of only 14 players; this club would appear in the bottom right of the plot. A manager that was struggling to find his best eleven might make the same number of changes per match but from a much larger pool of players; this club would appear near the top right of the plot.
On average, EPL clubs made around two changes per match to their starting lineups, from an effective squad size of twenty-five players. As you might expect, there is clearly a relationship between squad size and rotation: the more frequently that a club rotated, the greater the number of players they tended to use. West Brom, Chelsea, Burnley and Liverpool, who made just over one change per game, fielded the most consistent lineups. Along with Spurs, they also used the fewest numbers of players.
At the other end of the scale there is the two Manchester Clubs – both of whom made over three changes per game to their starting lineup – followed by Southampton, Middlesbrough, Swansea and Sunderland. Man Utd and Sunderland, along with West Ham and Hull, all used at least 28 players over the season (admittedly United started 5 players for the first time in their last game of the season).
So there was quite a broad spectrum of squad management styles in the EPL this season, with some clubs rotating more than twice as fast as others and using nearly 50% more players. Why is this? To what degree are team changes enforced or by choice? I’ll now review some of the factors that may have influenced team selection.
Injuries and suspensions will have forced managers to make changes to the team. According to
It took some managers several matches at the start of the season to identify the core players around which their team could be constructed. Others took much longer decide on their strongest lineup, and a few never did.
For example, David Moyes never really figured out his best team as the plot below demonstrates. He deployed 36 unique combinations of players during the season (nearly double that of Chelsea), and there was a lack of consistency in defence and midfield, both in terms of personnel and formation. Jose Mourinho also tried a large number of different combinations in every position, particularly in the second half of the season. While United’s rotation frequency certainly increased in the last couple of months of the season, they were already rotating at over 3 players per game before Christmas.
Does rotation matter?
Is there any evidence that rapid squad rotation influences results? This is a tricky question to answer because we don’t know how a team would have performed had they been more or less consistent in their team selection. Periods of high rotation do seem to have coincided with worse results for many teams (West Ham, Watford, Crystal Palace and Swansea, to name a few). However, there is a bit of a chicken-and-egg issue: poor results may compel a manager to change his team until he finds a winning formula.
I find there to be no significant relationship between squad rotation and final league position last season. However I would hazard the suggestion that the majority of teams that prioritized stability and a tight-knit squad – those nearest the bottom left corner of Figure 1 – all had successful seasons by their own standards. Crystal Palace are perhaps the exception, but the rate at which they varied their starting lineup dropped significantly (from two changes per game to one) in the second half of the season once Sam Allardyce took charge.
Similarly, those clubs that rotated frequently from a big squad generally had a disappointing year relative to pre-season expectations: City failed to mount a sustained title challenge, United finished sixth, and Hull, Swansea, West Ham, Middlesbrough and Sunderland were either relegated or flirted with relegation.
Perhaps this is just postdiction, but I think it warrants further investigation. It would be interesting to establish whether the performance of a team tends to decline towards the end of a long season if players are not rested. Are big squads problematic if managers are forced to rotate simply to keep his players happy? Does rotation interrupt momentum?
No Europe and a lack of injuries have helped, but the last two EPL seasons have been won by clubs that identified their best 11 players and stuck with them; tailoring and not tinkering. As clubs recruit over the summer we’ll see whether this is a theme that has started to resonate.
Thanks to David Shaw for useful comments. Lineup graphics for all EPL teams can be found here.
Figure 1: Stacked line chart showing the proportion of EPL managers by nationality in each season since 1992/93. Current season represented up to the 1st March 2017. The proportion of managers that are English managers has fallen from two-thirds to one-third over the past 24 years.
The figure shows a clear trend: the number of English managers has significantly declined over the last 24 years. Back in 1992, over two-thirds of managers in the EPL were English and 93% were from the UK as a whole. Since then, the proportion of English managers has more than halved, replaced by managers from continental Europe and, more recently, South America.
Is the trend towards foreign managers driven by supremacy over their domestic rivals?
The table below compares some basic statistics for UK & Irish managers with those of managers from elsewhere. Excluding caretaker managers, there have been 283 managerial appointments in the EPL era, of which over three-quarters have been from the Home Nations or the Republic of Ireland. Of the 66 foreign EPL appointments, nearly half were at one of the following Big6 clubs: Man United, Arsenal, Chelsea, Liverpool, Spurs and Man City. However, only 12% of British or Irish managerial appointments have been at one of these clubs. This is the selection bias I mentioned at the beginning – the top clubs are far more heavily weighted in one sample than the other.
At first glance, foreign managers have performed better: collecting 1.66 points/game compared to 1.29 for their UK & Irish counterparts (reproducing the results published in the Guardian article). However, this difference is entirely driven by the Big6. If you look at performance excluding these clubs it’s a dead heat – foreign managers have performed no better than domestic ones, both averaging 1.2 points per game.
At the Big6 clubs, foreign managers have collected 0.2 points/game more than their UK counterparts. This difference is almost entirely driven by Chelsea and Man City, where foreign managers have collected 0.8 and 0.7 points per game more than UK & Irish managers. But since Abramovich enriched Chelsea in 2003, they have not hired a single British or Irish manager. A similar story at Man City: in only one and a half of the nine seasons since the oil money started to flow into Manchester have they had a British manager (Mark Hughes). Both clubs had very different horizons before and after their respective cash injections, and they have hired exclusively from abroad since.
So it seems that, when you look closely, you find little convincing evidence that foreign managers have performed better than domestic managers in the EPL era. Why then do clubs prefer to look beyond these shores?
Access to foreign markets
Previous success is clearly a key criteria in manager recruitment, but I wonder if there are specific attributes that give foreign managers an edge over English candidates. In particular, foreign managers have local knowledge and contacts that might give a club the edge over domestic rivals in signing overseas talent. You could argue that Wenger’s initial success at Arsenal was influenced by his ability to identify and sign top French players at a time when France was dominating international football. Raphael Benitez certainly mined his knowledge of Spanish football to successfully bring a number of players to Liverpool.
In hiring foreign managers, do clubs improve their access to transfer markets overseas? As the table above shows, foreign managers sign players from abroad at roughly twice the rate of domestic managers -- an average of 5 per season compared to the 2.6 per season signed by their British or Irish counterparts. The result does not change significantly if you exclude the Big6 clubs, or if you only look at players signed in the last 15 years.
This doesn’t prove the hypothesis that clubs sign foreign managers to improve access to foreign players, but it does support it. Of course, being British isn’t necessarily a barrier to signing top overseas talent; after all, Dennis Bergkamp, arguably Arsenal’s greatest ever import, was bought by Bruce Rioch. But in era in which English players come at a premium, it makes sense to for clubs to hire managers that will enable them to lure high quality players from the continent.
Thanks to David Shaw and Tom Orford for comments.
 I define a caretaker manager as one that remained in post for less than 60 days.
 The proportion of managers from Scotland, Wales and Northern Ireland has generally remained stable at about 25% (although very recently it is has fallen).
 The first five are the five best finishers in the EPL era, on average. I decided Man City warranted inclusion because of their two EPL titles.
 Of the others, Wenger and Ferguson largely cancel each other out and foreign managers have performed only marginally better at Spurs and Liverpool.
 Indeed you have to go all the way back to Glen Hoddle’s departure in 1996 to find Chelsea’s last British or Irish manager.
 Mark Hughes was appointed before the Abu Dhabi group bought Man City.
eightyfivepoints.blogspot.com | Wed, 29 Mar 2017 10:37:00 +0000
It’s February and your club is in trouble. Following a run of poor results, they are hovering just above the bottom three. Fans and pundits alike are writing them off. The remainder of the season is destined to be a grim struggle for the points: a few snatched draws, the odd scrappy win, but mostly meek surrender to mid-table and above teams.
The board panics and fires the manger, it seemed the only remaining option. Granted, he did well last season – bought in some good players and promoted others, got them playing attractive football. But now the team needs defibrillation: a new manager with fresh ideas, inspiring players keen to prove themselves to him. A five game honeymoon period and, come spring, everything will be rosy again. After all, it worked so well for Sunderland last season.
This story seems to play out several times each season, but does it actually make any sense to fire a manager mid-season? A few years ago, Dutch economist Dr Bas ter Weel compared points-per-game won immediately before and after a manager has been fired in the Eredivisie.
Figure 1: the league position of EPL and Championship teams on the date their manager was fired (x-axis) against their league position at the end of the season (y-axis). The black circles represent EPL clubs, the blue triangles Championship clubs. The red diagonal line indicates the same position at departure and season end; the shaded regions above and below encompass teams that finished 3,6 or 9 places higher or lower than their position when the manager was sacked.
There is no evidence that teams gain any kind of advantage by sacking their manager. The median position change is zero, i.e. no change. Specifically: 30% of teams end in a lower position than when the manager was sacked, 23% see no change in their position and 48% see an improvement. If we compare this to the baseline sample -- clubs in similar positions in the table that retained the same manager for the entire season -- we find roughly the same proportions: 38% ended the season in a lower position, 17% saw no change in their position and 45% improved their position.
We can be more specific and look at clubs in the relegation zone when the manager departed. As the table below shows, of those that fired their manager 34% survived; of those that did not 39% survived. There is no evidence that firing the manager helps avoid relegation.
But what about Leicester?
Leicester fired Ranieri more than a month ago and have not lost since. They’re currently 2 places above their league position after his last game and seem likely to continue their recovery up the table. Didn’t they benefit from firing their manager?
While Figure 1 demonstrates that, on average, a club’s league position is not expected to improve after their manager is sacked, some individual clubs clearly did go on to significantly improve their league position. For instance, when Brendan Rodgers was fired from Reading in 2009/10 they were in 21st position; under his replacement, Brian McDermott, they went on to finish in 9th. Crystal Palace sacked Neil Warnock just after Christmas in 2014 when they were in 18th position; by the end of the season Alan Pardew had guided them to 10th.
On the other hand, clubs that do not switch manager also undergo miraculous recoveries. In the 2001/02 season Blackburn Rovers rose from 18th place in mid-March to 10th place by the end of the season. In late November 2008, Doncaster Rovers were rooted at the bottom of the Championship in 24th place; an eight match unbeaten run lifted them up to mid-table and they finished in a respectable 14th place. Both teams retained the same manager for the entire season: Graeme Souness and Sean O'Driscoll, respectively.
There are clearly circumstances that might necessitate a managerial firing in the middle of the season -- Leicester may be an example of this. But to pull the trigger without a clear diagnosis of what has gone wrong is a sign of desperation and poor decision-making. Indeed, over the last twenty seasons, EPL managers appointed during the summer months have, on average, lasted over 100 days longer in their jobs than those appointed during the season. Coupled with the large compensation payments that are often necessary to remove a manager, mid-season changes may actually end up harming the long-term prospects of a club.
 Specifically: Gullit in 97/98, Scolari in 08/09, Villas-Boas in 11/12 and Di Matteo in 12/13.
eightyfivepoints.blogspot.com | Sat, 11 Feb 2017 12:06:00 +0000
We're nearly two-thirds of the way through the 2016/17 EPL season, which seems a good time to try to predict what might happen. Chelsea’s nine-point cushion and relentless form make them clear favorites for the title; not since Newcastle in 1996 have a team blown such a lead. Just five points separate second from sixth as the remaining superpowers battle for Champions League places: who will miss out? Perhaps the mantra ‘most competitive EPL season ever’ is best reserved for the relegation fight, though. Six teams, two points and an ever-changing landscape. Amongst them: last season’s heroes, Leicester. Too good to go down?
Most TV pundits are definitive in their predictions, indeed they are typically paid to be so. Others prefer to let the numbers do the talking. Football analysts around the world build mathematical models to measure team strength and calculate the probability of match outcomes. Rather than saying “team A are likely to beat team B”, they'll say “I estimate that there is an 85% probability that team A will win”.
There is no agreed method for designing a forecast model for football. Consequently, predictions vary from one model to another. However, there is also strength in diversity. Rather than comparing and contrasting predictions, we can also collect and combine them to form a consensus opinion.
Last January, Constantinos Chappas did just that. Following gameweek 20, he collected 15 sets of predictions, averaging them to produce a ‘consensus forecast’ for the outcome of the 2015/16 EPL season. His article was published on StatsBomb
Not surprisingly, Chelsea are the clear favourites: the median forecast gives them an 88% chance of winning the league, as do the bookmakers. There’s not a huge amount of variability either, with the forecasts ranging from 80% to 93%. If Chelsea do suffer some kind of meltdown then it’s probably Spurs or City that would catch them, with median predictions of 5% and 4%, respectively. Liverpool and Arsenal are rank outsiders and any of the other teams finishing top would be an enormous surprise.
The Top Four
Now this is where things get a bit more interesting. Chelsea seem almost guaranteed to finish in the Champions League places, which leaves five teams fighting it out for the remaining three. Tottenham and Man City are heavily favoured: both have a median probability of at least 80% and the whiskers on their box-plots do not overlap with those of the next team, Liverpool.
The real fight is between Klopp and Wenger. Statistically they are almost neck-and-neck, with their box-plots indicating that the individual predictions are broadly distributed. Look closely and you see an interesting negative correlation between them: those that are above average for Liverpool tend to be below average for Arsenal (and vice-versa). You can see this more clearly in the scatter plot below. The reason must be methodological; to understand it we’d have to delve into how the individual models assess the teams' relative strength. Note that the bookies are sitting on the fence - they've assigned both Arsenal and Liverpool a 53% chance of finishing in the top four.
Man United are outsiders, but the consensus forecast still gives them about a 1 in 3 chance of sneaking in. Interestingly, the bookmakers odds – which imply a 44% chance of United finishing the Champions League positions - are way above the other predictions. Perhaps their odds are being moved by heavy betting?
The Relegation Candidates
Two weeks ago it looked like Sunderland and Hull were very likely to go down. Since then, the relegation battle has been blown wide open. The first six teams seem set for a nervous run-in and neither Bournemouth nor Burnley will feel safe.
The principal candidates for the drop are Sunderland, Hull and Palace, all of whom have a median prediction greater than a 50% chance of relegation. There is clearly a lot of variability in the predictions though, with the Eagles in particular ranging from a 38%-74%. You can certainly envisage any one of them managing to escape.
The next three clubs - Middlesbrough, Swansea and Leicester - are all currently level on 21 points, yet the median predictions imply that Middlesbrough (42%) are nearly twice as likely to go down as Leicester (22%). I suspect that this is because some models are still being influenced by last season’s results (for instance, Leicester's forecasts appear to bunch around either 15% or 30%). The amount of weight, or importance, placed on recent results by each model is likely to be a key driver of variation between the predictions.
What about <insert team’s name here>?
The grid below shows the average probability of every EPL team finishing in each league position. Note that some of the models (such as FiveThirtyEight, Sky Sports and the bookmakers) are excluded from the plot as I wasn’t able to obtain a full probability grid for them. Blank places indicate that the probability of the team finishing in that position is significantly below 1%.
An obvious feature is that Everton seem likely to finish in 7th place. The distribution gets very broad for the mid-table teams: Southampton could conceivably finish anywhere between 7th and 18th.
Last year’s predictions.
So how did last years’ predictions pan out? Leicester won the league, but the median forecast predicted only a 4% chance of this happening (compared, for example, to a 40% chance that they would finish outside the Champion's League places). However, the top four teams were correctly predicted, with a high probability of finishing there having been assigned to each of Leicester, Arsenal, City and Spurs.
Down at the bottom, both Newcastle and Villa were strongly expected to go down and they did. Sunderland were predicted to have only a 15% chance of staying up, yet the Black Cats escaped again. Instead, Norwich went down in their place having been 91% to stay up. Other surprises were Southampton (7 places higher than expected), Swansea (5 higher) and Crystal Palace (down 7).
How good were last year’s forecasts, overall? This is a tricky question and requires a technical answer. The specific question we should ask is: how likely was the final outcome (the league table) given the predictions that were made? If it was improbable, you could argue that it happened to be just that – an outlier. However, it could also be evidence that the predictions, and the models underlying them, were not particularly consistent with the final table.
We can attempt to answer this question using last season’s prediction grid to calculate something called the log-likelihood function: the sum of the logarithms of the probabilities of each team finishing in their final position. The result you obtain is quite low: simulations indicate that only about 10% of the various outcomes (final rankings) allowed by the predictions would have a lower likelihood. It is certainly not low enough to say that they were bad, it just implies that the final league table was somewhat unlikely given the forecasts. A similar result this time round would provide more evidence that something is missing from the predictions (or perhaps that they are too precise).
A final caveat..
Having said that – models are only aware of what you tell them. There are plenty of events – injuries, suspensions, and managerial changes – of which they are blissfully unaware but could play a decisive role in determining the outcome of the season. Identifying what information is relevant – and what is just noise – is probably the biggest challenge in making such predictions.
I will continue to collect, compare, combine and publicize forecasts as the season progresses: follow me on twitter (@eightyfivepoint) if you'd like to see how they evolve.
(This is a piece that I wrote for StatsBomb; I've copied it here.)
eightyfivepoints.blogspot.com | Wed, 18 Jan 2017 08:56:00 +0000
I was struck by the poor attendances at some of the FA Cup 3rd round matches this month. 17,632 turned up to watch Sunderland vs Burnley, less than half Sunderland’s average home gate this season. It was a similar story at Cardiff vs Fulham, Norwich vs Southampton and Hull City vs Swansea, all of which saw crowds below 50% of their league average this season.
An interesting statistic was recently posted on Twitter by Omar Chaudhuri, of 21st Club (
The next set of lines in Table 1 show the results for the FA Cup matches that had a mediocre attendance – those in which the attendance ratio was between 70% and 90% of the home side league average. The home team won 44% of these matches, which is slightly below the home win rate in the corresponding league matches. There is again a fall in the number of draws, but this time the away team benefits, winning 6% more often than in the league matches. The differences are small, but there is some evidence that the away team were benefitting from the below-average attendance.
However, the increase in away wins becomes much more striking when we look at poorly-attended cup matches: those in which the attendance was less than 70% of the home team's league average. The home team won only 34% of these ties, 14% below the corresponding league fixtures. The away win percentage increases to 42% and is 19% above the league outcome. Indeed, the away team has won poorly-attended cup matches more frequently than the home team. This is despite the home team winning roughly twice as often as the away team in the corresponding league fixtures (48% to 23%). The implication is very clear: when the fans don’t show up for an FA Cup tie, the team is more likely to lose. I don’t think I’ve seen any direct evidence for this before.
In all three sub-samples, it's worth noting that draws are down 5% relative to the corresponding league outcomes (although the beneficiary depends on the attendance). Presumably this is down to the nature of a cup tie: teams are willing to risk pushing for a win in order to avoid having to play a troublesome replay (or a penalty shoot-out during a replay).
So why are some fans not showing up? One obvious explanation is that they are simply unwilling to shell out more money beyond the cost of a season ticket. Maybe clubs should lower their prices for FA Cup matches; I’d be curious to know if any do. There could even be an element of self-fulfilling prophecy: the fans believe that their team have no real chance of winning the cup and so choose not to attend, to the detriment of their team. Perhaps the fans are aware that the cup is simply not a priority – their club may be involved in a relegation battle, for example – and that they are likely to field a weakened team.
The bottom line seems clear enough, though: if clubs want to improve their chances of progressing in the FA Cup they should ensure that they fill their stadium.
-------------------- Thanks to David Shaw, Jim Ebdon and Omar Chaudhuri for comments.
 Data was only available for all-Championship ties from 02/03, 08/09 for L1 and 09/10 for L2.
 Replays were retained, although the outcome of penalty kicks was ignored (i.e., a draw at the end of extra-time was scored as a draw). There are 64 replays in the sample in total, of which 8 went to penalties.
 One caveat is that the sample size is pretty small: this analysis could do with being repeated on a larger sample of games (and with the specific match attendances, rather than season averages). However, the increase in the away percentage in the smallest sample (attendance ratio < 0.7) is still highly significant.
eightyfivepoints.blogspot.com | Tue, 10 Jan 2017 09:50:00 +0000
Thirteen – an unlucky number for some. So it proved for Chelsea: just one win shy of equaling Arsenal’s record, their thirteen-match winning streak was finally ended by an in-form Spurs side. While there may be some temporary disappointment amongst Chelsea fans at having failed to set a new record, their winning run has almost certainly propelled them into the Champions League next season and made them clear favourites for the title.
Sir Alex Ferguson would often refer to momentum as being instrumental to success. A winning streak can sweep teams to the title or snatch survival from the jaws of relegation. What constitutes a good streak is clearly dependent on the team, though. Manchester United are currently on a five-match winning run: such form would certainly be outstanding for a relegation-threatened team, but is it common for a Champions League contender? This question is itself part of a broader one: what is form and how should we measure it?
In this blog I’m going to take a look at some of the statistics of winning streaks, investigating the characteristic length of winning runs in the EPL and how it varies for teams from the top to the bottom of the table.
How well do teams streak?
I started by taking every completed EPL season since 2000/01 and dividing the teams into bins based on their points total at the end of each season (0-40 points, 40-50, 50-60, and so on). For each bin, I measured the proportion of the teams in that bin that completed a winning streak, varying the length of the streaks from 2 to 10 matches. For example, of the 54 sides that have finished on between 50 and 60 points since the 2000/01 season, 17 (31%) completed a winning run of at least 4 matches. Runs were only measured within a single season – they do not bridge successive seasons. The results are summarized in Table 1.
Table 1: The proportion of teams that complete winning runs of two games or longer in the EPL. Teams are divided into bins based on their final points total in a season, from 0-40 points (top row) to >80 points (bottom row).
The top row gives the results for teams that finished on less than 40 points. The columns show the percentage that managed a winning streak, with the length of the streaks increasing from 2 (left column) to >10 matches (right). Three quarters of the teams in this points bin put together a winning streak of at least two games. However, the proportion drops very rapidly for longer runs: only 14% completed a 3-match winning streak and only 7% a 4-match streak. The only team to complete a 5-match winning streak was Newcastle early in 2014/15 (and this was half of the total number of games they won that season).
As you'd expect, the percentage of teams that achieve a winning streak of a given length increases as you move to higher points bins. Every team that has finished with 60 points or more has completed a 3-match winning stream. However, fewer than a quarter of those that finished with less than 70 points completed a 5-match winning streak. In general, the proportion of teams that achieve a winning streak drops off very rapidly as the length of the streak is increased.
The exception is the title-challenging teams (the bottom row in Table 1): the percentage in this bin falls away more slowly as the the length of the winning streak is increased. 27 of the 29 teams that finished with at least 80 points put together a 5-match winning streak, 13 completed an 8-match streak and 5 completed a 10-match winning streak. This is the success-generating momentum that Ferguson habitually referred to.
In his final 13 seasons (from 2000/01 to 2012/13), Man United put together 14 winning streaks lasting 6 matches or more; in the same period Arsenal managed only 5. United won 7 titles to Arsenal’s 2. For both teams, the majority of these streaks occurred in title-winning seasons. The same applies to Chelsea and, more recently, Man City. Only two title-winning teams have failed to complete a 5-match winning streak: Man United in 2010/11 and Chelsea in 2014/15. The median length of winning streak for the champions is between 7 and 8 games.
Leicester’s 4-match winning streak at the end of the 2013/14 season saved them from relegation. It was also an unusually long run for a team finishing on around 40 points - only four other teams have managed it. Was this a harbinger of things to come? A year later, during their title-winning season, their 5-match winning streak in March/April pushed them over the line.
The implications for form
Only the best teams put together extended winning runs: 40% of EPL teams fail to put together a three-game winning streak and 64% fail to win 4 consecutive games. Perhaps momentum - and the belief and confidence it affords - is only really relevant to the top teams? Does the fixture list throw too many obstacles in the path of the smaller teams? Every 3 or 4 games a smaller team will play one of the top-5 sides, a game that they are likely to lose. This may make it more difficult for them to build up a head of steam.
On the other hand, perhaps smaller teams are able to shrug-off their defeats away to Arsenal or Liverpool and continue as before. In that case, should we discard games against the ‘big teams’ when attempting to measure their form? And to what extent do draws interrupt, or in some cases boost, a team's momentum? These are all questions that I intend to return to in future blogs.
Finally, I’ll leave you with the equivalent table for unbeaten runs. While the typical length of unbeaten runs in each bins is about twice as long as winning runs, most of the conclusions above still apply.
Table 2: The proportion of teams that complete an unbeaten run of length 2 or longer in the EPL. Teams are divided into bins based on their final points total in a season, from less than 40 points (top row) to more than 80 (bottom).
Thanks to David Shaw for comments.
 The total number of teams across all bins was 320: 16 seasons with 20 teams per season.
 Note that the runs are inclusive - if a team achieves a 3-match streak it will also have achieved a 2-match streak.
eightyfivepoints.blogspot.com | Tue, 20 Dec 2016 13:32:00 +0000
Last week the Sunderland chief executive, Martin Bain, warned that only "very limited" funds will be made available to David Moyes in the January transfer window (see
Figure 1: Change in the average points-per-game measured before and after 1st January against total spending in the January transfer window for all EPL teams in each of the last six seasons.
Not all teams will be looking for an immediate return on their investment in January. Some will be buying back-up to their first team or young players for the future. The teams that will certainly be looking for an immediate impact are those embroiled in the fight to remain in the EPL. In Figure 2 I’ve highlighted the relegation-threatened teams in each season. Specifically, this includes all teams that were in the bottom 6 positions in the table on January 1st, plus those that went on to be relegated at the end of the season (as you’d expect, most relegated teams were also in the bottom 6 in January). Teams that were relegated are coloured red; those that survived are blue.
Figure 2: Change in the average points-per-game measured before and after 1st January against total spending in the January transfer window for all EPL teams (grey crosses) in each of the last six seasons. Teams marked by a square were in the bottom six of the table on 1st January; those in red were relegated, those in blue survived.
There are a couple of interesting things about this plot. First -- the majority of relegation-threatened teams see an improvement in their results in the second half of the season. I think this is just mean reversion: teams that underperform in the first half of the season are likely to do better in the second half. For example, over the last six seasons, teams in the bottom half of the table collected an average of 0.2 points/game more in the second half of the season than the first. The opposite is true of teams in the top half of the table: they tended to be an average of 0.2 points/game worse-off in the second half of the season.
Second -- there is no significant correlation between spending and improvement in results for relegation-threatened teams. If we split them into two groups, those that spent greater than £5m in January and those that spent less, we find that 38% (6/16) of the high spenders and 55% (12/22) of the low spenders were relegated. This difference is probably not big enough to be significant. Raising the stakes higher – of the four relegation-threatened teams that spent more than £20m in January, three were relegated: Newcastle & Norwich last year, and QPR in 2012/13.
It seems reasonable to conclude that teams should resist the temptation to try to spend their way out of trouble: there is little evidence that it will pay off. It looks like Bain is being prudent in tightening the purse strings.
 Note that for some teams it will be an underestimate as the transfer fee was never disclosed.
 This doesn’t have to be the case. For instance, there could be more draws in the first or second half of the season.  The results don't change significantly if we selected relegation-threatened teams as being those within a fixed number of points from the relegation zone.
eightyfivepoints.blogspot.com | Fri, 02 Dec 2016 13:34:00 +0000
There’s recently been a bit of discussion in the media (e.g:
Effect of participation in European competitions on a team's points total in the EPL over successive seasons. Green diamonds show the latest results for this season compared to the same stage last season. Blue dashed line shows results of a linear regression.
The blue dashed line shows the results of a simple linear regression. Although the relationship is not particularly strong – the r-square statistic is 0.2 – it’s certainly statistically significant. The slope coefficient of the regression implies that, for each extra game a team plays in the Europe, they can expect to lose half a point relative to the previous season. So, if a team plays 12 more games, it will be 6 points worse off (on average) than the previous season.
It’s worth noting that the CIES Football Observatory performed a similar analysis in a comprehensive report on this topic published earlier this year. They found there to be no relationship between domestic form and European participation over successive seasons. However, in their analysis they combined results from 15 different leagues across Europe. So perhaps the effect is more pronounced in the EPL than other leagues? This recent article in the Guardian, citing work by Omar Chaudhuri, suggests that the effects of playing in Europe may be more pronounced in highly competitive divisions. The lack of a winter break may also be a factor: while teams in Italy, Spain and Germany enjoy several weeks rest, EPL teams will play four league matches over the Christmas period.
Finally, an obvious question is whether we are simply measuring the effects of playing more games across a season. To test this, we should apply the same analysis to progress in domestic cup competitions. However, I’ll leave that to the next blog.
. The points along x=0 are teams that played the same number of European games in successive seasons (and did play in Europe both seasons). The only two teams that are omitted are Wigan and Birmingham City, both of whom played in the Europa League while in the Championship. Matches played in preliminary rounds are not counted.
 The null hypothesis of no correlation is resoundingly rejected.
eightyfivepoints.blogspot.com | Fri, 25 Nov 2016 11:35:00 +0000
The box plots indicate the distribution of each team's points totals over the 10,000 simulated seasons. The green bars indicate the 25th to 75th percentiles and the dashed lines (‘whiskers’) the 5th to 95th percentiles. For example, in 50% of the simulations Man City finish on between 71 and 81 points and in 90% of the simulations they accumulate between 63 and 89 points. The vertical line in the middle of the green bars shows the median. The numbers to the right of the plot show the probability of each team:
a)winning the title (Ti);
b)finishing in the champions league spots (CL);
c)being relegated (rel).
You can see that the table is bunched into three groups: those with a decent chance of making it into the champions league, the solidly mid-table teams and the remainder at the bottom. Let’s look at each group in turn.
Top Group: This group contains Man City, Chelsea, Liverpool, Arsenal, Spurs and, if we’re being generous, Man United. These are the teams with a fighting chance of finishing in the top four. City, Chelsea, Liverpool and Arsenal are so tightly bunched they are basically indistinguishable: you can’t really predict which of them will win the league. However, there is a 93% probability that it’ll be one of those four. Spurs go on to be champions on only 6% of the simulations and United in less than 1%. Indeed, United finish in the top four only 17% of the time – roughly a 1 in 6 chance.
Middle Group: This group includes Southampton, Leicester, Everton, Watford and West Brom. The distribution of their points totals indicate that they are likely to collect more than 40 points, but less than 60. That makes them reasonably safe from relegation but unlikely to finish in the top four (last season, the 4th placed team – Man City – finished with 66 points). They can afford to really focus on the cup competitions (and for Leicester, the champions league).
Bottom Group: Finally, we have the remaining nine teams, from Stoke down to Hull. According to my simulations, these teams have at least a 10% chance of being relegated. The bottom 5 in particular collect less than 40 points on average and are relegated in at least a third of the simulations, with Sunderland and Hull going down more often than not.
My plan is to update this table after each round of EPL games (which you can find
There are two predictors in the model: X1 = ΔElo/400, the difference between the team's Elo score and their opponents', and X2 is a binary home/away indictor equal to 1 for the home team and -1 for the away team. Note that Elo scores are explicitly designed to be predictive of match outcomes. The initial Elo score for each team is taken from
Comparison of probabilities assigned to ‘home win’, ‘away win’ and ‘draw’ by the Poisson model and those implied by bookmakers odds. All EPL matches from the 2011/12 to 2015/16 seasons are plotted.
One stand out feature is that draws are never the favoured outcome. This suggests that one of the keys to improving the accuracy of match outcome predictions is to better identify when draws are the most likely outcome. After all, more than a quarter of games end in draws.
------  Which happens to be close to the mean, so there isn’t much skew.
eightyfivepoints.blogspot.com | Sat, 12 Nov 2016 09:48:00 +0000
Manager rivalry is one of the big themes of the season. Many of Europe’s most successful managers have converged on the EPL, sparking renewed and fierce competition between England’s biggest clubs as they battle on the pitch to achieve domestic superiority. In the background there is another competition, one of a more individual nature. Guardiola, Mourinho, Conte and Klopp are seeking to establish themselves as the pre-eminent manager of their generation. As touchline galacticos, their rivalry mirrors that of Europe’s top players.
Success is often measured relative to expectation. Second place this season would probably be seen as a good finish for Liverpool, but not Man City. So Klopp and Guardiola will be judged against different standards. If Moyes guides Sunderland to a top ten finish he’ll win manager of the season.
For the same reason, it’s difficult to compare their track records. A manager may have won an armful of medals, but was it the result of years of sustained improvement or a few tweaks to an already excellent team? Can we compare the achievements of Wenger and Pulis, or Ferguson at Aberdeen and Ferguson at Man United?
To answer these questions we need an objective method for comparing the track records of managers over their careers. Not a count of the big cups in their cabinets, but a consistent and transferable measure of how much they actually improved their teams. In this post I’m going to lay out a simple method for measuring the impact managers have made at their clubs. I’ll then use it to compare the careers of some of the EPL’s current crop of talent.
There is one measure of success that is applicable to all managers: to increase the number of games the team wins. The problem is that it is not easily comparable over time: a manager can move from a small club to a big club, or one league to another, and his win percentage will vary irrespective of the impact he had on each team. However, there is a neat way of circumventing these issues, and that is to use the Elo score system.
Created by physicist
Figure 1: the Elo Impact of Sir Alex Ferguson from 1978.
The first thing that strikes me is that his peak at Aberdeen – the 1983-84 season, when he won the Scottish league and European cup-winners cup – is almost level with his peak at Man United manager (his second Champions League and 10th EPL title in 2008). This implies that Ferguson’s impact at Aberdeen and United are comparable achievements. That’s not an unreasonable statement: Ferguson won 3 of Aberdeen’s total of four Scottish titles and is still the last manager to break the Old Firm hegemony.
The striking thing about Mourinho’s Elo Impact (Figure 2) is that it is so much less volatile that Ferguson’s. Yes, the axis range is broader – Mourinho has had a lot of success in his career and his peak impact (at around 500) is substantially higher than Ferguson’s – but a quick estimate shows that Ferguson’s score fluctuates about 30% more. On closer inspection, this might be because Ferguson’s teams tended to win more of the big games but lose more frequently to weak teams than Mourinho’s (at least, until recently). However, this needs further investigation.
Figure 2: the Elo Impact of Jose Mourinho from 2004.
It’s worth emphasizing that the Elo score does not go up simply because trophies have been won, it does so if the team improves relatives to its peers. Jose Mourinho’s time at Inter is a good example of this. Despite winning the treble in his final season in 2010, Mourinho departed Inter having made little improved to their Elo score. This is because Inter were already the dominant force in Italy when he arrived, having won Serie A in each of the preceding three seasons. Put simply, it’s difficult to significantly improve the Elo score of a team that is already at the top.
Total, average (per year) and 16/17 season Elo Impact scores for current EPL managers.
The top 6 are pretty much what you’d expect, with one very notable exception. Tony Pulis, who has never actually won a major trophy as a manager, leads the table. This is not crazy: Pulis has improved the standing of every major club that he managed (a plot of his career Elo Impact can be found here). In particular, over his two stints as Stoke City manager, he took them from a relegation threatened Championship team to an establish mid-table EPL team.
I think that the example of Tony Pulis demonstrates one of the strengths of the Elo Impact metric – it is fairly agnostic as to where a team finishes in the league, so long as the team has improved. While we are naturally attracted to big shiny silver cups, some of the best work is being done at the smaller clubs. I fully acknowledge that repeatedly saving teams from relegation requires a very different managerial skillset to developing a new philosophy of football at one world’s most famous clubs; the point is that Elo Impact at least allows you to put two very different achievements on a similar footing. It’s a results-based metric and cares little for style.
Guardiola is perhaps lower than some might expect, but then he only had a small impact on Bayern Munich’s Elo score during his tenure. A few successful seasons at City and he’ll probably be near the top of this table. Why is Wenger’s average impact so low? As this plot shows, he substantially improved Arsenal during the first half of his tenure, but has essentially flat-lined since the ‘invincibles’ season. Further down the table, Bilic's score has fallen substantially this season as West Ham have had a disappointing campaign so far.
So what now?
I intend to develop Elo Impact scores for two purposes. First, I’ll track each manager’s scores over the EPL season to track who has had overseen the greatest improvement in their side. I’m happy to provide manager rankings for other leagues or individual clubs on request. Second, as new managers arrive, I’ll look at their Elo track record to gain an insight on whether they’re likely to be a be success or not.
It's going to be fascinating to see which manager comes out on top this season.
Thanks to David Shaw for comments.
 Although you do gain/lose more points for big victories/losses.  It is difficult to improve, or even just maintain, a team's Elo score once it rises above 2000. Few points are gained for winnings games and many are lost for losing them. Basically, the team is already at (or near) the pinacle of European football. For this reason I've made a slight correction to the Elo Impact measure: when a club's Elo score is greater than 2000 points, I've set the maximum decrease in a manager's Elo Impact to 10 points per game. Once the club's score drops below 2000, the normal rules apply.
eightyfivepoints.blogspot.com | Tue, 01 Nov 2016 12:21:00 +0000
Halloween may have passed but Arsenal's fans will remain fearful throughout November. This is the month where, historically, Wenger's team have tended to perform significantly below par. Since Wenger took charge in 1997, Arsenal have collected an average of 1.6 points per game in November, compared to a season average of 2 points per game.
In fact, as the figure below demonstrates, Arsenal don't really recover until mid-December. The thin blue line shows the average number of points that Wenger's Arsenal collect in each gameweek of the season; the dashed blue line shows a 3-game moving average. The Nov/Dec curse is clearly visible.
For comparison, I've also plotted the same results for Man United under Ferguson. For both teams, I used data from the seasons 97/98-12/13, the period in which the two managers overlap.
Average number of points collected by Arsenal (blue) and Man United (red) over the seasons 97/98-12/13. Solid lines show the average for each game week, dashed lines show a 3-match moving average.
It's interesting to compare the seasonal performance of the two managers. In the first and final thirds of the season, Wenger's points-per-game closely matches Ferguson's. However, while Ferguson's teams would step up their performance in December (perhaps after the group stage of the Champions League finished), Wenger's seem to struggle in early winter before improving in February.
I have no idea what causes Arsenal's end-of-year blips: injuries, Champions League involvement, fear of the English winter, or excessive bad luck? Whatever it is, we'll all be watching with interest to see if they can overcome it this year.
-------  And significant, in the statistical sense.
eightyfivepoints.blogspot.com | Fri, 28 Oct 2016 08:17:00 +0000
We’re nearly a quarter of the way through the EPL season and the league already has a familiar feel to it. Manchester City are top, Arsenal are above Spurs, and Sunderland anchor the table having failed to win a single game so far. There is clearly a lot of football still to be played, but does the table already resemble how it’ll look come the end of May?
Conventional wisdom tells us that the turn of the year is a crucial period. By the beginning of January we are supposed to have a good idea of how things are shaping up. In 9 of the last 20 EPL seasons, the team that was top at January went on to win the league. 56% of teams in the bottom three on new year’s day will be relegated. However, you get pretty much the same results if you measure these stats at the beginning of December or the beginning of February, so perhaps we don’t learn that much over the Christmas period after all.
In this post I’m going to look back over the last 20 seasons to investigate how the league table actually evolves over a season and, in particular, when in the season we start to have a reasonable picture of where teams might finish.
A good starting point is to measure the correlation between the final league positions and those at some earlier point in the season. Essentially you’re measuring the degree to which the orderings of the teams are the same. If the team rankings were identical, we’d measure a correlation of 1; if they were completely different we’d expect the correlation to be close to zero.
Figure 1 shows the correlations between the league rankings after each game week and the rankings at the end of the season, for the last 20 EPL seasons. The grey lines show the correlations for the individual seasons; the red line shows the average.
Figure 1: The correlation between the league rankings after each gameweek and the final rankings at the end of the season. Grey lines show results for each of the last 20 EPL seasons, the red line shows the average correlation for each gameweek.
The most striking thing about this plot is that the correlation rises so quickly at the beginning of the season. You get to an average correlation of 0.8 - which is very high - by the 12th round of games. There’s some variation from season-to-season, of course, but the general picture is always the same: we learn rapidly in the first 12 or so games, and then at a slower, even pace over the rest of the season.
This implies is that we know quite a lot about how the final league rankings will look after just a third of the season. But there’s no mantra that states ‘top in Halloween, champions in May’, so why is the correlation so high so soon, and what does it actually mean?
Leagues in leagues
I think that the explanation is provided by what is sometimes referred to as the ‘mini-leagues’. The idea is that the EPL can be broken down into three sub-leagues: those teams competing to finish in the top four (the champions league places), those struggling at the bottom, and those left in the middle fighting for neither the riches of the champions league nor for their survival.
Figure 2 demonstrates that these mini-leagues are already established early in the season. It shows the probability of each team finishing in the top 4 (red line) or bottom 3 (blue lines), based on their ranking after their 12th game. The results were calculated from the last 20 EPL seasons.
Figure 2: The probability of finishing in the top four (red line) or bottom three (blue line) based on league position after 12 games. The red, white and blue shaded regions indicate the three ‘mini-leagues’ within the EPL.
The red-shaded region shows the ‘top’ mini-league: the teams with a high chance of finishing in the champions league places. Teams below 7th place are unlikely to break into this elite group. Similarly, teams placed 14th or above are probably not going to be relegated; therefore, those between 7th and 14th position are in the middle ‘mini-league’. Teams in the last third seem doomed to be fighting relegation at the end of the season: they make up the final mini-league.
The high correlation we observed after twelve games in Figure 1 is consequence of the mini-leagues. It’s entirely what you’d expect to measure from a table that is already clustered into three groups – top, middle and bottom – but where the final ordering within each group is still to be determined.
I’m not suggesting that membership of the mini-leagues is set in stone – there’s clearly promotion and relegation between them throughout the season (and yo-yo teams) – but by November there is a hierarchy in place. Even at this relatively early stage of the season, most teams will have a reasonable idea of which of third of the table they are likely to end up in.
Finally, awareness of this may also explain the recent increase in the number of managers getting sacked early in the season. Last year three EPL managers lost their jobs before the end of October and we've already lost one this season. If Mourinho doesn't find himself in the top eight after the next few games, the pressure may ramp up several notches higher.