Simulating skill in the Premier League: part1

I love sports. Every week I watch Tottenham play, and just as regularly I go through the emotional roller-coaster that entails. As a sports fan I use the first person plural to describe 'my' team, and am convinced we need to keep Gareth Bale, just as I hold the conviction that Tottenham will bottle it come May. But at the same time, I get frustrated when pundits bring out stats along the lines of: 'Wigan haven't lost in their last 5 trips to X' (OK, that would be quite a good stat), or Liverpool haven't won without Suarez since February, or 'player X has scored in five games in a row, wow is he on fire'. I am convinced sports pundits highlight 'trends' that are actually almost all random. Of course skill plays a big part in sport, but when everyone has skill, how important is chance? Below I demonstrate that doubling skill levels, rather than guaranteeing victory instead results only in a 50% increase in points gained.

Sparked off by a conversation about how important marginal gains due to statistics might be for sport, I decided to actually check.

 I wrote a basic football simulator, the goal being to see how much variation in results is due to chance, and how much due to skill, data analysis, tactics, having the best players, etc. etc. I wrote a simple simulator that plays every game for a season and stores the results. The only rules are that teams may only score once every 5 minutes, and that they have a fixed chance of doing so. Here is one example of a match, between Tottenham and Arsenal, with Tottenham having a 5% chance of scoring, and Arsenal a 4% chance (more accurately: 95 and 96% chance of not scoring)

I wrote the code in Python, and am happy to share, but for now I've struggled to make iPython play nice with Blogger, so I'm pasting images of my code instead:

In this case, as sadly so often in real life, Arsenal won. And not just by a small margin, but 3-0. Pundits would have had a field day, identifying why Arsenal were so much better, what tactical decisions were crucial, etc etc. Except in this case, any patterns identified would be nothing but coincidence. The result is even against the odds - Tottenham had a 20% skill advantage but still failed to capitalize. The reason for this is that in football the odds of scoring are small, consequently a lot of results are determined by luck. However, you might say, 'I bet this guy just ran the simulator loads of times until he got the result he wanted. These things even out over the course of the season'. And you'd be right about the former, but not the latter. So to prove my point holds, I wrote a mini league with just Tottenham and Arsenal where each season involves only two games. I ran this simulation 100 times to see how big an effect the 20% advantage would yield. What do you think I found?
Virtually no effect. We might expect an average of 3 points over two games, but that would ignore the option of a draw. Over 10000 trials in a fair simulation (both teams had 4% chance of scoring), the average points total was 2.66. The first time I ran the simulation, Tottenham picked up on average 2.7 points, that is, despite a 20% performance bonus they only got a 2% increase in results. But, 100 is a small sample in statistical terms (though much larger than the number of local derbies in any footballer's career!), so I re-ran the simulation 10 000 times, to get the actual points benefit, and included a histogram to show the variation in points won per season:

This gives a truer picture: now Tottenham won close to three points over two games on average, corresponding to an 11.2% increase in results. The crucial point here is: the randomness inherent in football's rules as a low scoring game outweighs any small performance bonus. Now, just to demonstrate that the code is not wrong, here's what happens when Tottenham have a ten percent chance of scoring, 2.5 x Arsenal's:
So here the large skill diference translates into a noticeable, but not unbelievable benefit over 10 000 games: on average Tottenham won 4.3 points out of 6, or 61% more than we might expect. The same pattern that we saw above holds: for any percentage gain in results, twice the percentage increase in skill is needed. In the first simulation a 20% increase in skill resulted in a 10% gain in results, while in the second a 120% skill bonus resulted in a 61% improvement in results. This relationship is almost linear: as a team approaches 5x higher skill level, victory is virtually guaranteed. This model is effectively one with a tipping point, where victory goes from being in large part due to luck, to becoming inevitable.

What is going on on here? Well, scoring a goal is improbable. If we change the rules to allow for tennis scores, the results would be much starker. The upshot of this is that to guarantee success in football, you have to be multiple times better than the opposition, not just marginally better. Doubling performance will increase victory likelihood by 50%.  Intuitively this makes some sense: upsets in Football are much more common (think of any World Cup) than in sports such as tennis, where the game structure is such that marginal skill differentials leads to predictable results. Because in reality skill differences at the elite level are small, luck will always play a considerable role. OK, let's not get carried away here: this simulation is beyond simplistic, arguably scoring a goal is about a chain of events all coming off, all of which require skill. Also an infinite number of factors could be added: home advantage, red cards, star players, etc. etc. The point I hope I made, though, is that while people focus tremendously on those variables, the variable of chance tends to be ignored. I often hear commentators term near-misses 'unlucky'. Very rarely do they say the opposite about the screamer from 30 yards that finds the top corner. 

In part2 I explore a somewhat more complex model and simulate a whole season with the right number of teams, and explore how much skill is needed to give a believable premier league table, and how much better the best teams have to be to ensure a consistently high position. 

No comments:

Post a Comment