As Analytics Game kicks off its coverage of Major League Baseball — and I write my first article for the site — I’ve decided the best use of this new platform is to introduce readers to the basics of the statistical movement now taking place in baseball front offices. This movement has been gaining steam for awhile now, and was popularized and publicized in the critically acclaimed Moneyball book and motion picture. Moneyball showed how a team could employ analytical concepts to maximize the efficiency of their baseball operations. At the time, this was a novel concept to many fans, and even some teams, that gave hope for baseball-loving number crunchers everywhere, providing the basis for public stat-based sites like FanGraphs. This public database offers almost every statistic imaginable, making it possible for the everyday baseball fan to analyze and formulate their own well-substantiated opinions and arguments about the baseball players and teams they root for. In this article, I will use the FanGraphs statistical base to construct an understanding of what constitutes and fuels a baseball offense, while hopefully providing insight into the thinking of baseball front offices.
Before offense can be analyzed, the very concept of it must be established, as it can easily get lost in the vast amount of numbers being thrown around (ie. averages, on base percentages, home runs, etc.). Fortunately, it requires simply returning to the fundamental concept of baseball itself in order to realize what offense actually is: The team with the most runs wins the game. While runs certainly are not the sole focus of baseball as a whole, it should be the sole goal on the offensive side of the ball. Runs are what every team tries to create in each turn at bat. Therefore, runs are the benchmark of offensive prowess. With this seemingly obvious concept uncovered, runs can be broken down into the sums of their parts using regression analysis, a method used to determine the strength of correlation between two datasets. An R-value of 1.0 is a perfect correlation whereas an R-value of zero indicates that the two datasets are completely uncorrelated. While correlation does not necessarily imply causation, analyzing the facets of hitting against a single variable (runs) can show us that some variables rise with runs at a much more consistent rate. These statistics can also be used as very good indicators as we are breaking down statistics within baseball. To prove this use, I took custom team statistics from the last ten years from the FanGraphs site in order to compare these numbers to their run output over that time.
Before discussing what does relate to runs, factors that do not should be identified. For example, stolen bases do not even come close to correlating to runs, with a 0.1 R-value that indicates almost no correlation. This contradicts a common misconception that speedsters manufacture runs through stealing bases. There has been a long history in baseball of overpaying for stolen bases, that players like Michael Bourn are currently benefiting from, and while speed may have a significant impact on the defensive side, it shows little promise as a medium for fueling an offense through swiping bases. Fans like to see players who can steal bases and create some excitement, but front offices that have done research on this subject will show that stolen bases have little effect on the offensive potency of a team. In fact, over the last ten years the Mariners had the least amount of runs, but more stolen bases than even the Red Sox, who held the title of second-most amount of runs. The Mariners had around 100 more stolen bases but almost 2000 less runs that the Red Sox, exemplifying this disconnect.
Another statistic that does not correlate to runs scored is strikeout percentage. This too has been a major discussion point amongst baseball broadcasts everywhere, as former baseball stars reminisce on the terrors of every strikeout and how it abolished any hope of scoring. Whether this is a misunderstanding or just misremembering, history shows that there is little-to-no correlation of strikeout percentage to runs, showing about a 0.31 R-value. Baseball logic will say that putting the ball in play can move runners over. A groundball instead of a strikeout can move that winning run into scoring position. The logic seems to make sense, but the statistical results simply do not follow. If results do not follow over a large sample size, it is nonsensical to keep such a firm belief in these concepts. In other words, time has proved that these seemingly thorough ideas are perhaps conceptually respectable, but in practice do not drastically have an effect on the production of runs.
While disassociated statistics are important to note, it is appropriate and perhaps more interesting to move on to what facets of baseball do relate to offensive success. First off, dollars spent does moderately correlate to runs scored at a reputable 0.63 R-value. The recent sentiment that low-budget teams can compete on the same playing field as the free spenders, while ideal, is unfortunately not statistically proven. Historically, teams with higher payrolls have scored more on average than their lower spending counterparts. This should make some degree of sense as these teams are often spending money on big name players who are in the primes of their careers. The Yankees have used this strategy in the past decade, shelling out mammoth contracts to sluggers like Alex Rodriguez in order to vault themselves into the category of top run-scorers over the past ten years. The Yankees then rode this offensive firepower to a World Series title in 2009. Teams that could not pay for Alex Rodriguez’s talents because of financial restrictions missed out on a player that provided 38.4% more runs than the average player each year during his Yankee tenure, according to FanGraph’s Weighted Runs Created (wRC+) statistic. Rodriguez is only one example of how a highly paid player can influence the run totals of any team. A team of these players (a high payroll team) on average will trump the run totals of teams with low payrolls, as long as the money is allocated to the right players.
The next wave of statistics I’ll discuss all feature R-values around 0.8, indicating a strong correlation to runs. These statistics are hits, home runs, and isolated power. It is important to compare and contrast these statistics and their components to understand why they are such a strong indicator of runs scored. Home runs are the simplest of these figures to connect to runs. Each home run has to be worth at least one run, and is usually worth more if there are any runners on base. This makes a home run the most valuable hit possible and an asset every team should look for if they have an interest in scoring runs and being the offensive powerhouse we’re discussing today. The Yankees, a team consistently excelling with offensive prowess, also provide an example of this concept as they have hit the most home runs and scored the most runs over the last ten years (perhaps aided by their famously short right field fence in their new home ballpark).
The next statistic of decently strong correlation is isolated power (ISO), which measures how often a player hits for extra bases, or gets past first base on a hit. This standardizes slugging percentage by taking singles out of the equation in an attempt to measure a player’s true power or propensity to slug extra base hits. A high isolated power correlates with more runs scored as the higher the ISO, the closer each batter is to home plate per hit. This consistently puts players in position to score more often than the singles-driven teams. While singles do not correlate to runs scored, (0.34 R-value), clearly extra base hits do, with that high 0.8 R-value mentioned earlier. To be fair, hits in general correlate to runs scored. Teams with more hits score more runs – it’s that simple. However, while this seems like the most obvious relationship, it is in actuality the shallowest in terms of full understanding. Singles correlate statistically poorly with runs and doubles only moderately. Therefore, two teams with the same hit totals could have very different run totals if one team were filled with singles hitters and the other with home run boppers. Hits correlate to runs assuming that there is a good distribution of doubles and home runs along with the lowly singles. Hits include the home run and the components of ISO, so it is easy to see how these statistics are related, but also how the hits statistic can be superficial and misleading.
Lastly are the most important statistics. These are on-base percentage (OBP), slugging percentage (SLG), and on-base plus slugging percentage (OPS). On-base percentage is undoubtedly the most talked about of these statistics. This was the main sticking point of the Moneyball movement and started to be highly publicized once fans realized its value. The value of OBP can be found in its 0.9 R-value in relation to runs scored. While this may seem like a statistic that should not be more valuable than home runs or hits, when broken down, striving for OBP is the ideal philosophy for scoring runs. OBP indicates how often a player does not create an out per turn at bat. With only three outs in an inning, getting on base is crucial to scoring runs. This concept is exemplified by two 2013 MLB stars who hit just over 0.320, Chris Johnson (0.358 OBP) and Mike Trout (0.432 OBP). Mike Trout has a very high on-base percentage in relation to his average, while Chris Johnson’s is miniscule. If a team had three Chris Johnsons in a row in their lineup, the chances that all three reached base (and most likely scored runs or at least left the bases loaded) would be just 4%. If a team had three Mike Trouts in a row, that number would double to 8%, further diminishing the league average proportion of 3%. In other words, high OBPs can double a team’s run scoring chances over the span of just three batters. While this may seem insignificant, it can make a big difference over the course of thousands of innings. Obviously more factors need to be taken into account to truly show the significance of OBP, and three isolated events don’t indicate a full inning’s worth of action and also don’t take into account the potential for hits of different quality. This was one simple way of only taking reaching base into account. Nonetheless, it helps show the value of not making an out and therefore having a runner make it to a base.
Scaling upward, slugging percentage (SLG) holds the next highest correlation to runs at an R-value of 0.93. This is an interesting finding, as many people have started to value ISO over SLG when comparing hitters. Slugging percentage is very similar to ISO, except for the fact that it includes singles. Slugging percentage is the value of the average number of bases a hitter touches in each at bat. While it has been proven that OBP is more valuable than slugging percentage, analysis showed that in the last ten years, slugging percentage has correlated with runs scored at a stronger rate. This could be due to a number of factors, such as the time frame (including the tail end of the steroid era and walks being ignored for so long), but the importance of slugging percentage cannot be ignored. In basic terms, SLG measures the distribution of hits, which is why it is a better indicator of runs than the hits statistic alone. It can help paint a picture of the hitting value brought by a certain player or team.
On-base plus slugging percentage (OPS) has been the ultimate indicator of runs over the last ten years at a 0.97 R-value, or a near direct relationship with runs scored. This statistic has been criticized as it values on base percentage and slugging percentage equally, when it has been shown that these two values are not equal. However it is eye-opening when two values correlate at this strong of a rate. OPS encompasses two factors: a player’s hit distribution and their selectivity. These are two of the most important facets of offense, and may explain why this statistic holds so much weight. Players like Andrew McCutchen and Troy Tulowitzki, among others, shine in this area, a talent and statistic that put them in prime position as annual MVP candidates. Valuation of this statistic also bodes well for soon-to-be free agent Nelson Cruz, who is 11th in the Major Leagues in OPS thanks to his keen eye and light tower power.
While imperfections can always be found in studies that rely mainly on correlational data, I have strong faith in this new statistical age of baseball and its potential to innovate what drives a team’s offense. This was but an introductory article, explaining the basis behind many of the ideas being thrown out in baseball commentary today, and my hope is that it gives readers a solid base of statistical terms and their relevance in America’s Pastime. Even this cloud of knowledge can help the most casual fan more accurately judge their favorite team’s activities. It is only scraping the surface of the depth to which baseball can be analyzed in this day and age. While I addressed many different elements of offense, this new statistical wave opens up the possibility for future articles to address pitching statistics or even defense. The potential for statistical research in this field is limitless and can help everyone better understand the game we love.