Sparked off by a conversation about how important marginal gains due to statistics might be for sport, I decided to actually check.

I wrote a basic football simulator, the goal being to see how much variation in results is due to chance, and how much due to skill, data analysis, tactics, having the best players, etc. etc. I wrote a simple simulator that plays every game for a season and stores the results. The only rules are that teams may only score once every 5 minutes, and that they have a fixed chance of doing so. Here is one example of a match, between Tottenham and Arsenal, with Tottenham having a 5% chance of scoring, and Arsenal a 4% chance (more accurately: 95 and 96% chance of not scoring)

@rolffredheim City seem to be at the forefront here, though Big Sam's relentlessly empiricist way of using stats turns a lot of people off.

— Trey Causey (@treycausey) April 30, 2013

I wrote a basic football simulator, the goal being to see how much variation in results is due to chance, and how much due to skill, data analysis, tactics, having the best players, etc. etc. I wrote a simple simulator that plays every game for a season and stores the results. The only rules are that teams may only score once every 5 minutes, and that they have a fixed chance of doing so. Here is one example of a match, between Tottenham and Arsenal, with Tottenham having a 5% chance of scoring, and Arsenal a 4% chance (more accurately: 95 and 96% chance of not scoring)

I wrote the code in Python, and am happy to share, but for now I've struggled to make iPython play nice with Blogger, so I'm pasting images of my code instead:

In this case, as sadly so often in real life, Arsenal won. And not just by a small margin, but 3-0. Pundits would have had a field day, identifying why Arsenal were so much better, what tactical decisions were crucial, etc etc. Except in this case, any patterns identified would be nothing but coincidence. The result is even against the odds - Tottenham had a 20% skill advantage but still failed to capitalize. The reason for this is that in football the odds of scoring are small, consequently a lot of results are determined by luck. However, you might say, 'I bet this guy just ran the simulator loads of times until he got the result he wanted. These things even out over the course of the season'. And you'd be right about the former, but not the latter. So to prove my point holds, I wrote a mini league with just Tottenham and Arsenal where each season involves only two games. I ran this simulation 100 times to see how big an effect the 20% advantage would yield. What do you think I found?

Virtually no effect. We might expect an average of 3 points over two games, but that would ignore the option of a draw. Over 10000 trials in a fair simulation (both teams had 4% chance of scoring), the average points total was 2.66. The first time I ran the simulation, Tottenham picked up on average 2.7 points, that is, despite a 20% performance bonus they only got a 2% increase in results. But, 100 is a small sample in statistical terms (though much larger than the number of local derbies in any footballer's career!), so I re-ran the simulation 10 000 times, to get the actual points benefit, and included a histogram to show the variation in points won per season:

This gives a truer picture: now Tottenham won close to three points over two games on average, corresponding to an 11.2% increase in results. The crucial point here is: the randomness inherent in football's rules as a low scoring game outweighs any small performance bonus. Now, just to demonstrate that the code is not wrong, here's what happens when Tottenham have a ten percent chance of scoring, 2.5 x Arsenal's:

So here the large skill diference translates into a noticeable, but not unbelievable benefit over 10 000 games: on average Tottenham won 4.3 points out of 6, or 61% more than we might expect. The same pattern that we saw above holds: for any percentage gain in results, twice the percentage increase in skill is needed. In the first simulation a 20% increase in skill resulted in a 10% gain in results, while in the second a 120% skill bonus resulted in a 61% improvement in results. This relationship is almost linear: as a team approaches 5x higher skill level, victory is virtually guaranteed. This model is effectively one with a tipping point, where victory goes from being in large part due to luck, to becoming inevitable.

In part2 I explore a somewhat more complex model and simulate a whole season with the right number of teams, and explore how much skill is needed to give a believable premier league table, and how much better the best teams have to be to ensure a consistently high position.

## No comments:

## Post a Comment