Wednesday, June 15, 2011

40k Theory: It's All About The Numbers

What is ballistic skill? 

I mean, what does it really mean in terms of winning and losing on the table top? 

Absolutely nothing.  Quantity of fire in Warhammer 40k is usually far superior to quality.  At very least it's equal.  So what am I getting at?  There is no correlation between winning and ballistic skill.  The army with the superior average ballistic skill shouldn’t (assuming the game is properly balanced) have any advantage over one with a lower average ballistic skill.

This is the inherent problem I've been wrestling with mentally.  When players compare units/armies/lists they are, generally speaking, comparing irrelevant metrics.  So what that Khorne Berserkers have 3 attacks each?  So do Orks, and you can get 4 Orks for the same price as one Berserker. 

Not a single stat on the units' stat lines give you the faintest hint of if it will make your army win or lose more games by using it.  Even when you factor in points, it's mostly irrelevant.  As I said in the moneyball article, none of the "stats" in a unit's stat line are correlated with wins on the tabletop.  So we really can't use those stats as a metric for measuring a unit's effectiveness.

Warning: numbers ahead

So what is correlated with wins on the table top?  Ability to kill infantry through shooting and close combat; and the ability to kill light mech and heavy mech through shooting.  Those are what matters.  Got a unit that can't do any of those things well? You have a useless unit that is losing you games, regardless of the points.  Got a unit that does all of them well?  Awesome!  How many points does it cost?  Too many probably.  The key is tempering the usefulness of the units with the cost.

OK, so how do we measure "effectiveness?"  Remember when all the rage was posting an army list and then totaling up all the heavy/special weapons it had to demonstrate its firepower?  That was a step in the proper direction.  But it was dumb.  OK, your list has 12 missiles and 6 lascannons.  What does that mean?  Is that better than 11 missiles and 7 lascannons?  Is it better than 36 heavy bolters and nothing else?  What I’m getting at was those lists were meaningless without a benchmark.

My solution, and I stress again that this is crude and has plenty of room to be improved upon, is to breakdown four relevant statistics...

Dead MEQ from all out shooting (DMS):  Assuming that you are firing all your guns at optimal range with max firepower.  Rapid firing at 12" with heavy/special weapons getting to shoot.  How many MEQ does mathhammer say you kill on average per turn?  For flamer templates, assume 4 hits.  Small blasts, 3 hits.  This metric measures a unit's ability to kill infantry with shooting.  Are there other factors?  Sure.  Conditions won't always be optimal.  Some units do better in suboptimal positions than others.  Combi-weapons present a unique problem.  Assume that combi-weapons do not get to fire for these purposes.

Dead MEQ on the charge (DMCC):  Assuming you get the charge off, how many MEQ does your unit kill per turn?  Pretty easy, and the best way to measure a unit's close combat prowess.  I know that it favors power weapons and makes certain units super strong vs MEQ that are bad against Orks.   I'll address the special issue of power weapons later.  But as a baseline statistic, this is the simplest way to create a relevant close combat measuring stick.

Dead Rhinos Per Game (DRPG): Not strictly correct according to the name.  What we are measuring here, assuming the unit fires at non-covered Rhinos for 5 turns at optimal distance, is how many penetrating hits will it score against against AV11 in a single game.  Remember, optimal distance can be 48" or it can be 6" depending on the unit.  

Normally, this is a shooting only category.  However, certain melee units are geared in such a way that they are incredibly dangerous to rear armored AV10 vehicles.  A wolf lord with thunderhammer on a thunderwolf mount will kill vehicles.  A Carnifex with the right mods laughs at Land Raiders.  A unit of Nobs with Power Klaws kill vehicles short of Land Raiders with ease.  I would consider a Deff Roller in this category, too.  If a melee unit is commonly used for anti-vehicle melee attacks, I include their assault potential in this section.  This is somewhat controversial, but some armies (Orks, Nids) count on their ability to kill vehicles in melee, and it would only be fair to include their stats.  However, to keep the numbers relevant, I've limited the CC to 15 "dead rhinos per game" to 15, as they will only be able to affect at most 5 vehicles per game, and it takes 3 penetrating hits to wreck one.  So if you mega-nobz could do 85 penetrating hits per game against Rhinos, in reality you will kill 5 Rhinos at most, hence a score of 15.

Dead Land Raiders Per Game (DLRPG): Same as above, but for AV14.

Let's take a common unit, 6x Long Fangs with 5x Missile Launchers.  Its stats would be...

DMS: 2.48
DMCC: 1.48
DRPG: 8.25

Compare it to a 5x Space Marine Devastator Squad with 4x Missile Launchers

DMS: 1.98
DMCC: 0.91
DRPG: 6.6

So the Long Fangs are significantly better at killing light mech, slightly better at shooting MEQ, and better on the charge.  And they are cheaper in points.  Clearly, this system is decent at measuring the effectiveness at unit superiority in this case.

Let's compare two units that fulfill central for their armies but look quite different...

5x Grey Hunters with meltagun and WG with combi-melta

DMS: 1.65
DMCC: 1.42
DRPG: 3.73
DLRPG: 2.38

5x Fire Dragons

DMS: 2.8
DMCC: 0.54
DRPG: 15.51
DLRPG: 9.90

Is it surprising to anyone why every Eldar list runs 3 units of Fire Dragons now?  We understood before that FD are good.  Now you can see that they are very good, in black and white.  Grey Hunters are described as jack of all trades units that go anywhere, do anything.  These stats bear that out.  They have decent game in all 4 criteria we care about.  Not bad for a troop choice!  Let's see how they stack up to a "bad" troop choice, Dire Avengers.  We will assume the DA Bladestorm every turn they can, and thus only shoot 1/2 of the turns.

10x Dire Avengers with Bladestorm

DMS: 1.64
DMCC: 1.09

Yeah, DA suck.  We knew that.  These stats bear it out pretty well.  Make Grey Hunters look pretty good, eh?  At this point I'm reasonably happy that we can at least crude measure the effectiveness of unit's offensive capability.  This is a good start. 
Potential criticisms/flaws and how I address them...

Defensive ability:  This system doesn't measure a unit's defensive abilities.  I don't care.  Defense doesn't win 40k, offense does.  Upon looking deeper into the game design mechanics, the points cost of units are very strongly correlated to defensive ability and not correlated very precisely to offensive capability.  English translation: we pay extra points for defensive ability, not so much for offense.  Therefore, we don't need to take defensive capability into account, because as you'll see later we will take points cost into account.  And since points cost is positively correlated with defensive ability, by taking points cost into account we are taking defensive ability into account.

Close combat metric favors power weapon units.  Yup.  A 5 man MEQ unit with power weapons will kill a couple MEQ on the charge.  They will also kill a couple of Orks.  Good against marines, not good against Orks.  As far as I'm concerned, the only proper way to address it would be to add a "dead Orks per turn" metric, which is ridiculous.  Beating up Orks in close combat says very little about an army's ability to win games.  Beating MEQ in close combat does. 

Lascannons are better than Missiles:  We know that on a 1:1 basis, Lascannons are superior.  20 Lascannons will have higher DRPG and DLRPG than 20 Missiles.  But when you factor in the points, as described below, you'll see how the stats change.  This system is adequate to explain the superiority of missiles.  Plus, you'll see how much more effective missile are at shooting MEQ outside of their transports and begin to see that versatility matters, and is adequately accounted for in this system.

MSU are favored by this system:  No, MSU are favored by the 40k ruleset.  This system just proves it, and expresses why in hard numbers.  Who is surprised that 2 units of 5 with 2x meltaguns are better than 1 unit of 10 with 2x meltaguns?  Are you really going to argue that?

Mobility isn't factored in: Yes it is.  Mobility, like defensive ability, is strongly correlated with points.  Rhinos are a mobility provider, defensive provider that add almost no offense.  Their points cost is thus completely correlated with mobility and defensive ability.  Razorbacks/Wave Serpents are similar, except you can precisely see how much extra you're paying for offense.  The offensive ability of Jump Pack Assault Marines is the same the same Assault Marines on foot without jump packs.  The price difference is what you pay for mobility.  So when we discuss points below, you'll see that mobility is accounted for in this system.

Points: Yes, Grey Hunters are better than DA.  But what about when you take points into consideration?  How does this system address points?  Easy.  Choose a point level.  Buy as many of those units as you can for that points level.  Compare the aggregate score.  It takes more than 3 Dire Avengers to kill as much in close combat as 1 Grey Hunter.  If you can't buy 3 DA for the same points as 1 GH, it's fair to say that point for point GH are better at close combat.  Shooting it’s a bit closer, but the advantage is still squarely with Grey Hunters.  This is, incidentally, where we see that horde armies are or not balanced against msu/elite armies.  It will also show why 30 Orks will kill a lot more than 5 GH in this metric, but you will also note that the 30 Orks are twice as many points as 5 GH.

Foot lists: Yes, you can design foot lists that “beat” this system by having more heavy/special weapons due to saving points on transports.  However, until a foot list wins a competitive format GT there is no benchmark for what aggregate scores a competitive foot list has.  When a competitive footlist wins a GT, we’ll have a stick to measure all other foot lists against.  Suffice to say for now, that a footlist has to exceed a mech list in all categories by a good amount in order to be competitive.  If you have a foot list that has lower aggregate scores than a GT winning mech list, you can bet that your foot list won’t be winning any GTs.  Please note, I don’t count tyranids as a foot list, because they were designed to be competitive with mech lists, and their aggregate scores should be similar.

Now let's talk whole armies, which is what this is all about.  You can add up the aggregate score for the four categories of the two armies and compare them, like I said.  This is useful.  But we need to set some benchmarks for what a “good score” is.  What I would like to do is take the top 4 armies at last year's NOVA and create the aggregate scores under this system, and see how they stack up.  And then we would have a benchmark of proven winners in an indisputably competitive GT setting that we can use to benchmark any 2k list against.  We will see if there are any lessons we can learn, by comparing what ratios favor winners, what ratio leads to losers.

We can use that info to do some data-mining that competitive 40k hasn't seen ever before.  We can for the first time see, in accurate precise numbers, how much melta you really need to bring to kill enough Land Raiders to win.  How many Rhinos do you need to be able to penetrate per turn to win a GT?  Right now, players bring as much or as little as they feel comfortable with based on experience and "feel."  We'll know the precise answer to that, in numbers.  This level of precision, even under my crude measuring stick, is entirely new to 40k analysis.

My hypothesis is that winning armies will share common traits and be very balanced.  My other hypothesis is that losing lists will be very imbalanced and also share some common traits, or lack thereof.  Time will tell whether I'm right or wrong, but I am excited to have the answer.

Let’s break down 4 of the undefeated NOVA lists.  My Excel sheet is here if you want to see my raw data.  Like I said above, I had to make a lot of assumptions and your assumptions may differ slightly.  My mathhammer may have some errors, but they should be the all wrong in the same direction, so if there are mistakes they will cancel out in the end and our conclusions can remain solid.  Let’s see the aggregate numbers for the armies…

Tony Kopach (Space Wolves)

DMS: 19.28
DMCC: 28.29
DRPG: 59.18
DLRPG: 17.04

Andrew “Stelek” Sutton (Space Wolves)

DMS: 25.05
DMCC: 24.82
DRPG: 64.80
DLRPG: 19.27

Justin “Dashofpepper” Hildebrandt (Orks)

DMS: 18.90
DMCC: 45.75
DRPG: 80.00
DLRPG: 31.04

Mark Ferrik (Blood Angels)

DMS: 22.21
DMCC: 15.15
DRPG: 82.35
DLRPG: 33.74

What do these numbers tell us, especially in light of knowing how they performed.  We can see how similar Tony and Stelek’s armies were in scores, as you’d expect.  Stelek’s list was very MSU based as he is wont to do.  As a result, his shooting scores are generally better than Tony who adopted a hybrid approach between MSU and maxed out units.  Tony’s list has a better close combat score, but not hugely so.  Basically, Stelek traded away CC ability for more vehicles and better shooting.  Tony sacrificed shooting ability for close combat punch.  But the armies ended up with scores that were close enough to demonstrate that it came down to generalship and luck to determine the winner.

The mech Blood Angels list is interesting.  Because it spent a lot of its points on vehicles, it has low close combat ability, compared to the two Wolf lists, despite Mephiston.  However, it sacrificed that for more anti-tank ability.  It has a very lot of melta weapons and its vehicles provide excellent anti-light mech capability.  Comparing the army’s scores to the others, we would predict that it would do well against mech heavy opponents, and perhaps struggle to kill large infantry units or deal with strong CC units.  The results bore out that hypothesis.  In his only loss of the tournament the BA list was “too aggressive” and “got too close” to the enemy and was beaten in close combat.  This is an inherent weakness in the list.  Its main anti-tank has a 6” effective range, but it doesn’t have exceptional close combat ability.  We can figure, the shorter range your anti-tank weaponry is, the better at CC you should probably be in order to deal with that inevitability. 

Dash’s Orks present something much different than the rest of these lists.  While there are two small Loota units, almost all of the anti-mech ability comes in close combat in the form of Power Klaws, Burnas, and Deff Rollas.  This is obviously a liability against an opponent who has fast vehicles, but a skilled general can deploy and move in such a way as to keep that from being used against him.  Also, the key to his entire anti-tank strategy are the Battlewagons.  If you don't stop his battlewagons by his second turn, he is going to destroy all your vehicles quit quickly.  You can also see that if the Battlewagons are gone before he takes his second turn, his anti-tank potential is gone, because not only does he lose the Deff Rollas, but his Nobs and Ghaz lose their ability to get to the tanks to do their damage.  So why do Dash's Orks win, despite Orks being "bad?"  His army can kill infantry in close combat by the handful and assuming he gets the first turn, he has almost no problems killing vehicles.  The only real weakness is shooting MEQ, and the dependency on his battlewagons to 'turn on' this strategy.  All in all, his numbers aren't far outside of what we would consider normal, at least in this sample of clearly good lists.

What I'd like to do next is compare the scores in these lists to lists that went 0-4.  It would be very valuable to get that information for comparative purposes.  I believe MVB will be getting that to me, and I'll post it up.

So if you made it this far, congrats.  Comments?  I'm sure there will be some.

edit: army lists are found here.  Thanks to Danny Internets for publishing them.


  1. so how do I know which to use? play them all until I get the :stats: to compare?

  2. Uh, wow! This was incredibly insightful, thank your for writing it!

    I would love to see how the Grey Knights fit into the grand scheme, seeing as they lack AP1/2 saturation.

    Seriously, though, write more of these.

  3. I enjoyed that, but would like to see the lists (or at least links) to the four armies you are comparing at the end.

  4. This really appealed to me. Excellent idea. I have two questions.

    Should the DMCC figure make some allowance for initiative relative to MEQ (4)? Since a squad's CC offensive potential will be reduced due to losses if they have a Initiative lower than MEQ. Conversely, Initiative higher than MEQ is effectively a force multiplier.

    Second question, what does this mean for my 4000 points of Nids :) (I knew Nids have it tough, this just quantifies it - sorta).

    Well done, look forward to more analysis.

  5. @Loquacious You write your own 2,000 point list you're thinking about using at a tournament and then score it under this system. How it compares to these armies should tell you something.

    @Lycacian They are on bald and screaming, but I'll link to them later on.

  6. @40knidman There wasn't an unbeaten Nid list at the NOVA, or else I would have done it. My thoughts are that nids are designed to be competitive with modern mech lists so a top tier nid list will be very similar to these in aggregate scores.

    As far as initiative goes, that is a defensive tool, and I ignored it. Unit defense including initiative is already taken into account by the points cost of the unit, for the most part.

  7. Are you giving AP1 the appropriate bonus when counting penetrating hits? e.g. an AP1 pen is worth approximately 3/2 AP2+ pens.

  8. This is possibly one of your best posts yet - I don't play much 40k but I really, really enjoyed reading what you had to say :D

  9. This is an excellent idea and a great post. I've followed your blog for a long time and really enjoyed all your posts, especially the ones that bring game theory from other games into 40k.

    That being said, I have a concern with using point cost as a metric for durability/mobility. They are generally correlated with points, but not always. To use a straightforward example, a Grey Hunter costs about 3/5 of a Purifier, but they have about the same level of durability/mobility.

    Defense (including leadership) DOES win games in 40k, specifically by keeping your offense around long enough to effect the game in later turns, and by keeping troops alive to claim objectives.

    As an extreme, hypothetical example, lets say there was a unit that had the same offensive output as Grey Hunters, but was T3, L7, with no ATSKNF and a 5+ save, but cost 1 point less than a Grey Hunter. Under your metric, the new unit is better, as it has more offense per point, but obviously the Grey Hunter is superior.

    I don't want to take away from the groundbreaking work you're trying to do, and you admit in the post that these are brand new and very crude metrics, but I don't feel comfortable discounting defensive ability completely.

    Hope that makes sense and is constructive!

  10. Love love love the sabermetric approach to WH40k, please keep up the great work!

    (if I wasn't buried in term papers to grade, I'd have more insightful things to say)

  11. Pretty interesting for sure! I think this sort of analysis is a good thing- knowing your army's capabilities is paramount to winning games.

    One thing you might be missing however, is a unit's ability to 'get there'. A unit of 5 Chaos Chosen with 4 melta guns footslogging might have an amazing dead Land Raider per game ratio, but it's not really a viable unit just footslogging. The points they would pay for a transport would actually HURT their score, implying that the unit without the transport is actually BETTER. We all know that in most cases, that isn't so.

    I could write up a list with HUGE HUGE awesome numbers, that would be incredibly shitty, but the numbers would say it was pure win. I guess what I'm trying to say is that though a numbers system like this is good, we also need to take into account 'intangibles' such as this.

    As an aside, how would we measure a power that forced failed leaderships (like Fear of the Darkness or a Psyker Battle Squad)? These powers have the potential to win entire games without actually dealing damage- killing a couple Nob Bikers and then walking them off the board with a chimera or Vendetta is certainly a viable option on the Tabletop, though that power in and of itself kills ZERO models/turn.

    Heh, just things to think about, thanks for taking the time it took to put together an article that clearly took a lot of both thought AND time :)

  12. Wow. Excellent culmination of your past few articles. I think you've figured out, rather concisely, 40k's "Moneyball"

  13. @DrCrab The Grey Knight example is valid. Don't misinterpret what I said though. Defense and mobility ARE covered by points, but points doesn't ONLY cover defense and mobility. My main contention is that GW tends to price defense and mobility higher than they do offense. When I put my 2k Crowe list through this system, you'll see how GK stack up.

    The second example of a hypothetical unit is not valid. Such a unit doesn't exist so there is no point conjecturing it. When GW fails that bad at game design/balance, then we will have something to discuss, but until then it seems that most units are priced at least somewhat reasonably in 5th edition books.

    @Xaereth I did address the fact that foot lists will tend to have higher scores. Like I said, when a foot list wins a competitive GT then we will have a benchmark for what the scores of a "good" foot list should be. As of right now, we just need to accept that a foot list has potentially higher scores than a Rhino based list.

    I am also fully aware that you could write a list of horrible units that score highly in this system to try to break it. I guess that's fine. The goal isn't to create a system that can't be exploited, since if this system wasn't breakable it would be adopted at every comp scored tournament. The goal is to take your REALISTC GT caliber lists and compare them to proven GT caliber lists. If you want to use this system or a tweaked version of this system for that, I think it would be beneficial to you. If a person wants write lists to 'beat' the system, he is wasting his own time, not mine.

    I think the real value of this idea is still unproven, ofcourse. It will come down to whether it has predictive power or not. If you roll up an mech Space Wolf list for a GT that scores similarly to these lists, it should preform decently, assuming you play well and have average luck. Whether that predictive power is true or not is what matters, and we will find out.

    If I had to guess, I would say it has a great chance to have predictive power. Can you imagine a list with scores a lot worse than these lists winning NOVA this year? I can't. That's a good sign that the scores from these lists DO represent a valid benchmark. And if so, we are onto something big.

  14. One thing you'll need to do for sure is look at the bad lists. See if you can find the winless lists from NOVA, and see how bad their scores are. If they end up with comparable scores to the undefeated lists, then you've got a problem. If they end up with terrible scores, then you've got confirmation.

  15. @Sam That's definitely the plan. When Mike gets back from Texas he is going to dig some up for me. But that does make the assumption that when someone goes 0-4 his list was the problem, not his generalship.

    Look at the people who scrub out at Magic The Gathering Pro Tours. Even the worst players there usually bring decent enough decks.

    But we'll definitely see.

  16. Heh, I'd for sure be interested to see what sort of lists lost all their games at the NOVA last year. I agree with you that most solid, competitive armies will have similar numbers, somewhat by necessity. Good generalship can only make up so much for a bad list.

    I guess then we have to face the question of how much does a list have to do with success as opposed to good generalship- the two are obviously linked to some extent, but to what extent? If you or I had gone to the NOVA last year with Stelek's list, in his place, would we have done as well, or maybe even better? If we would have done worse, then the numbers indicate that perhaps they should be higher, whereas if we'd have done better, the numbers would be somewhat of a benchmark.

    Meh, maybe I'm just rambling now. Anyways, for sure thought-provoking stuff. I'm sure you're planning on updating us on the crappy lists when you can :)

  17. @Nikephoros: OK fair enough, that unit doesn't exist and never will. I don't want to make a straw man argument, I just wanted to point out that units have different balances in their point costs of offense vs. defense/mobility.

    You might also be interested in checking out the new Moneyball trailer, with Brad Pitt and Jonah Hill, if you haven't seen it already.

  18. How would I score a unit of pathfinders in my Tau army? Or Piranha? Which I use largely as turboboosting roadblocks. In game, they dont actually contribute much in way of shooting, but according to your article, the fusion blasters could prove quite lethal against Vehicles.

    Also, I noticed that one in 3 penetrating hits, 'destroys' a vehicle. But what about railguns, who through AP1 increase those odds to 1 in 2 penetrating hits?

  19. I'd be interested in how you exactly made these calculations. Just from an initial glance the numbers seem off from straight math hammer,what factors did you include?

    In addition I think excluding things like range is a mistake for these types of calculations, if a unit cannot engage things at optimal range until turn 2 or 3 it will do less damage than if it could get there turn 1. Take Ghazghkull for example, if I understand your math he is killing 5 Rhinos in a game. Assuming a 5 turn game, this is nearly impossible, and certainly impossible that he will auto hit said vehicle turn 1.

  20. @breng77 The more variables you try to control for the more you realize how many more variables are out there... and how pointless it is to control for them.

    As far as mathhammer goes, instead of going *2/3 I went .66%. Makes the numbers slightly different.

    I didn't assume auto-hits on vehicles, I assumed 4+ to hit.

  21. I understand the number of variables, however,but it is important to take things like range into account in the final analysis (even if you don't put it in the numbers). Also I get very different numbers than what you come up with for some units (not explainable by .66 vs 2/3)

    Take the dire avengers for example (if you can actually shoot every turn Bladestorm is not worth giving up the shooting, you end up killing less overall)But lets assume a blade storm as you have, So for one round that is 30 shots from 10 DA, 2/3 hit (.66 from your math) so that is 19.8 hits (20 if you use 2/3) half wound which is 9.9 wounds(10 if you use 2/3) which on average is 3.3 dead MEQ from shooting, if you then assume a 5 turn game, they shoot 3 times out of 5 turns so they kill 9.9 marines durning the game or an average of 1.98 marines per turn, more than the 1.54 you have listed. So I am jsut interested in what process you used to determine your numbers. If for no other reason that without that knowledge this is a tool that really cannot be employed by your readers.

  22. The statistic are misleading, especially that DMS/DMCC are for one turn when DRPG/DLRPG are for five turns while mixing CC and shooting into one number.

    The CC and shooting parts need to be separated. What I see is three pairs, all counted over one turn:


    Five turns' numbers are an obvious fail as there is no way an army can survive untouched for five turns. Shooting and CC needs to be split because although it is possible for a unit to shoot every turn, there are no units that can CC every turn.

  23. Just to make your life difficult - the maths on the Orks is wrong, at least the DLRPG - All normal PowerKlaw weilding Nobs (and Big Meks) are S9 on the charge, so while unlikely, can dent a Raider. Should only add ~1-2 to his total, but still :)

  24. Excellent. Headed out on a long road trip; this will give me something to play with.