Thursday, February 23, 2012

40k Metrics: The Data We Want Versus the Data We Have

Warhammer 40,000 has a lot of aspects that can be reduced to numbers through basic probability and algebra.  While analysis of in-game unit or army statistics definitely has a place and plenty of good uses,  and the end result is  (hopefully) a more accurate prediction of army performance on the table top.

But why are we doing all this mathhammer?  To help us make better choices, in order to win more often.  Duh.

It's still all about the numbers.

With that said, doesn’t it strike you as hilarious that we don’t have even rudimentary statistics for some of the most basic things that other competitive sports or games take for granted?  We are woefully ignorant about decisions that other competitive activities have put to bed. 

Let’s talk about the most important one first, the giant elephant in the tournament hall: who should take the first turn?  In chess the player going first wins about 56% of the time.  Why don’t we have this basic statistic for 40k?  It wouldn’t be hard to determine.  On your tournament score sheet, have the players check a box to determine who went first.  Once we have the raw data for who went first, and who eventually won, the data mining potential is endless. 

Just knowing whether or not the player who went first won more often doesn’t actually help you make a choice, actually.  Let’s hypothetically say that the player who goes first in the NOVA Open wins 55% of the time: great!  I choose to go first. 

But wait!

Let’s say that an army like Eldar happen win 60% of the time when going second due to different unit capabilities for codex.  Knowing that overall the first turner won isn’t relevant to an Eldar player.  And by the same token, a Guard player might win 70% of the time when going first, making it even more of a no-brainer.  So clearly, once we have the raw data, the first order of business is to sort it by codex.  Having access to the results of taking the first turn on a codex by codex basis is data worth having.

And that is just the surface.  The next level is to sort the data by player record.  How often did the top 10% of finishers play first or second?  Was that number greater or less than the tournament or codex average?  Knowing this would allow us to see if the first (or second) turn has greater predictive power for highly skilled players.  If the tournament average says 55% of first turners playing Space Marines win, but at the top tables the average skews to 65%, it would be quite clear that overall success is highly correlated to the turn decision.  If it turns out to be less, it would be relevant to know that among the top players, turn decision is less important.  Either way, the answer is worth knowing.

This wouldn’t be too difficult to do, adding a single check box on a results slip, and a line item on the Excel spreadsheet or database program.  In a day and age were sabermetricians can tell you the precise amount that home field advantage benefits a high school basketball team, it is fairly egregious that 40k players (for whom nerdy things like statistics and spreadsheets ought to be second nature) don’t have a definitive answer for the simple question, “should you take the first or second turn?”

So who is going to be the TO who wants to give the community this huge gift?

Thoughts?  Comments?  Questions?


  1. This sort of data would indeed be neat to have. I'm not a big tourney player, but it would be interesting. That's a good start, plus being able to break down a codex's wins not just by win/loss but also by what book they played against. Seeing that (for example) GK beat SM 70%, Orks 55% and Tau 35% would be interesting to see what kinds of R/P/Scissors charts could be constructed!

  2. Yeah, it definitely doesn't take much data mining skills to get some really valuable information out of results data. I still think knowing your codex' first/second turn choice is a huge benefit.

    Once we get that, then you can refine it to first/second turn choice vs. each enemy codex. That would probably take a larger sample size than even a big tournament like a NOVA/Adepticon. But if we had 3 or 4 such tournaments we would probably be able to say, "I'm playing Marines vs. Tau, if I go first I am a 15% favorite, if I go second I'm an 8% underdog." That would be damn valuable informtion.

  3. I'm going through the data now Nike jeez. ^^

  4. I come from a fighting game background where we have exactly this type of stat. Results from big tournaments are collected and put into grids basically showing how likely any given character was to beat another character. Something like this:

    Over enough time and data, a fairly accurate tier list can be generated. New technology is discovered all the time, though, so you still have to take it with a grain of salt.

    Having something similar for 40k would be awesome, but I don't know if it would really be that useful. Where Chess and Street Fighter are static games, I think 40k has too many variables to consider. Showing that Orks beat Eldar 78% of the time doesn't really seem all that meaningful if we don't know what the list compositions were like.