The Boomers Were Right
Batting Average is REALLY Important
Whenever my father-in-law watches baseball with us, he invariably scoffs at the players with low batting averages, a typical boomer in this regard. My son and I, born of a different generation, patiently explain that batting average isn’t important, what matters most are on base percentage, and slugging. Len, if you’re reading this, this is the one and only time you’ll ever read or hear these words from me: I was wrong. Batting average is really important, arguably more important to run scoring than any other element in a batter’s traditional triple slash line. Put another way, batting average is more important than patience, and is (arguably) more important than power.
Thanks for reading Eli’s Baseball Research! Subscribe for free to receive new posts and support my work.
I was intrigued by Ben Clemens’ recent article wherein he took Voros McCracken’s suggestion and looked at team-level triple slash lines to see how they impacted run scoring. In it, he shows that when you have OBP and SLG as variables, batting average becomes irrelevant. In fact, he correctly demonstrates, that in the presence of those variables, a higher batting average leads to fewer runs; and, as he describes it, this is an unintuitive effect.
It got me thinking as to why that would be the case. A higher batting average is good, a lower batting average is bad. Hitting above .300 is good, hitting below .200 is bad. Then it hit me, when we compare 3 variables, we must ensure that we are not re-measuring the same variable more than once. In the case of AVG/OBP/SLG, batting average is present (quite significantly) in all 3 variables. It makes up 100% of AVG, roughly 75% of OBP, and 60% of SLG. Saying that when we don’t need batting average when we have OBP and SLG is like saying we don’t need pizza when we have peperoni pizza, or asking for an ice cream sundae without the ice cream.
AVG = AVG
OBP = AVG + BB% (or would if they had the same treatment for sac bunts and flies)
SLG = AVG + ISO
OBP is going to be more predictive than AVG most of the time, since it’s using two (improperly) weighted variables and combining them, just as with slugging, where we (improperly) weight 4 variables.
Let’s ignore sac flies and bunts for a moment and create 2 theoretical batters, each with an OBP of .400, one with a .333 BA (210 hits + 70 walks over 700 PA) and one with a .250 BA (140 hits + 140 walks); which batter is more valuable? If we look at linear weights, the batter with the .333 average is worth almost 13 runs more, even if all those hits were singles. That’s more than 1 win of value. Put another way, the higher the percentage of OBP is batting average, the better off you are.
If one were to argue that “we don’t need batting average when we have on base percentage”, they are inadvertently advocating for a flawed metric. The distinct components of on base percentage, specifically, batting average and walks (again, ignoring sacs) are important and should be kept separate and distinct.
Let me illustrate this concept by rephrasing the question Tom Tango asked of whether a .260/.365/.510 line is better than a .315/.365/.510 line. Let’s frame it as an AVG/BB%/ISO% question. Which is the more productive batter (AVG/BB%/ISO%): a .260/.105/.250 hitter or .315/.050/.195 hitter?
In this instance, it’s clear that the batter is trading 55 points of batting average, for 55 points of isolated patience (read walk%) and 55 points of isolated slugging. As we’ll soon see, this is indeed a very good trade-off that works out to roughly +10 runs per 600 plate appearances. It should be noted that people I hold in very high esteem assumed these two batters were roughly equal due to their OBP+SLG being equivalent, which illustrates how using a double slash (OBP/SLG) line can sometimes lead to (in my estimation) incorrect conclusions.
Where we go from here: BAPP not OPS
We still need a triple slash line, just not the traditional AVG/OBP/SLG, it should be AVG/BB%/ISO, batting average, patience, power (BAPP). Some have expressed frustration as to why we haven’t moved on from batting average, after all this time. That’s because we can’t, nor should we. Batting average is really, really important. We can only ignore it if we bury it (improperly) in something else. We absolutely should be displaying it as a primary metric to evaluate a hitter with.
If we looked at just batting average, walk percentage and isolated slugging, which variable would be the most important?
For this, I replicated Ben Clemens’ study, with a couple of tweaks:
1) Instead of looking at the traditional slash lines, I took AVG, BB% and ISO
2) I used Runs Per 600 Plate Appearances, compared to slash line points (essentially the percentage multiplied by 1000) so I could include 2020 data
Here’s what I got:
Every incremental point of batting average (i.e. going from .270 to .271) is worth 0.224 more runs per 600 plate appearances, followed by 0.2 runs per point of ISO and 0.156 runs per 600 PA for every additional point of BB%. By this analysis, the best way to improve your run scoring is to boost your batting average. It is important to note, that all 3 components had extremely low P-values, indicating they were all very statistically significant in predicting runs, as one would expect.
Looking at this another way, if you are trading batting average for ISO (a typical trade-off) you want to make sure you gain at least 10% more ISO points than AVG points. Hitting .300 + .150 ISO is slightly better than a .250 average and a .200 ISO, despite both of those being equivalent from a slugging standpoint.
I want to stress this point some more. Batting average is REALLY IMPORTANT. Saying that we have on base percentage and slugging, and therefore we don’t need batting average, ignores the fact that we are lumping batting average into their numbers (with a 50% weight for each). Naturally, we don’t need batting average when we have batting average + walk percentage. However, batting average, when separated out, is meaningful. It’s statistically significant. It matters. We absolutely should be showing it as a key stat when we look at a batter. The reason it boosted prediction slightly in Ben’s model, was that the weights are off in OBP and Slugging, so including it again allows the model to re-weight the average component.
A Brief Case for BAPP
The cool thing about BAPP? It has an average of roughly .500 (before 2022 ruined that and made it closer to .490), which means we can use easy rules of thumb such as “.500 is about an average hitter”, and every 50 points of BAPP means you’re about 10% better than league average. If a batter hits .300, we can say they are roughly 60% of the way to being an average hitter (this isn’t precisely correct, but it should be a fair approximation). If a player has a .200 average and a .100 BB%, they’d need roughly a .200 ISO to be an average-ish hitter.
Let’s look at how the 3 components correlate to team Runs, starting with the aggregate BAPP:
BAPP to R/PA R2 = 0.86
BAPP has an 0.86 R2 to R/PA compared to 0.83 for OPS, so it is also a slight improvement in accuracy.
Now for a slight twist, if we look at which number has the strongest correlation (on its own) to runs, it’s by far and away ISO:
ISO to R/PA R2 = 0.72
AVG to R/PA R2 = 0.26
BB% to R/PA R2 = 0.25
ISO has an R2 of 0.72, compared to 0.26 for batting average and 0.25 for walk percentage. I would guess that this is due to ISO having the most variability, so it has the most predictive power, in, ahem, isolation. It’s also easier, I think, to raise your ISO 50 points than raise your batting average 50 points, so one could easily make the argument that optimizing ISO is more important, and I wouldn’t necessarily disagree.
Dismissing batting average, in this author’s view, is just plain wrong. It is statistically significant in terms of predicting team runs, and on a per point basis, the most impactful component of BAPP, and indeed OPS. Perhaps we should be talking about BAPP replacing OPS, rather than talking about OPS replacing batting average. Batting average, based on this analysis, should absolutely be prominently displayed alongside every batter. A point of batting average is worth a lot more than a point of walk percentage. If we show a player’s on base percentage, we are actually adding together AVG and BB% improperly. Let’s stop showing AVG, OBP and SLG and instead move on to BAPP – AVG/BB%/ISO.
I love this. I've been playing around with BAPP on an individual level to see how it might shift our perception of players. One thing I'm noticing is, at least relative to OPS rankings, the players hurt most by a shift to BAPP is high BA with moderate-to-low BB% and ISO players.
You sort of addressed this at the top, but I'm wondering if you'd agree with this takeaway: BA matters, but in the effort to replace BA, some stats used as a replacement (mainly OPS) were accidentally overvaluing BA.
I'm a Cleveland fan so I've been playing around with their stats. An example that jumped out was 2014 Carlos Santana vs 1995 Carlos Baerga:
Based on OBP and OPS you'd say they're similar:
Baerga: .355 OBP, .807 OPS
Santana: .365 OBP, .792 OPS
But BAPP shows Santana as a much more valuable player:
Baerga: .314 BA / .058 BB% / .138 ISO - .510 BAPP
Santana: .231 BA / .171 BB% / .196 ISO - .598 BAPP
Thoughts on this comparison and takeaway?
Great article, Eli! Thanks for sharing! Essentially, each of the traditional triple slash line metrics needs the others to complete the story.
SLG doesn't tell us how often the player avoids outs.
OBP doesn't tell us what proportion of the non-outs come from the more valuable hits (and doesn't reward extra base hits).
BA doesn't tell us the quality of the hits or the other ways to get on base.
I really like the idea of BAPP.