In today’s world of advanced metrics, SportVU player tracking data — and just the general opinions of many regarding the NBA — there are too many instances where players talents are evaluated, compared, and debated based on arbitrary claims and confusion (I’m looking at you, Skip Bayless and Stephen A. Smith). While I too am one of the many who do this on a regular basis, there has been a growing need to accurately measure and quantify these arguments to at least paint a clearer picture of what we say. Pundits on ESPN, and analysts themselves, construct arbitrary ways with which to measure a player’s value. Ratings like PER and phrases like the “clutch” factor or the “will to win” are designed to enhance these arguments but only seem to contribute to the confusion by assigning subjective methods to quantify value.
Naturally (and as you would expect given the tone of my preceding paragraph), I would like to propose a few metrics with which to better quantify the quality of a player. I will even go so far as to list some caveats pertaining to my results and how we can best interpret them. I do this in an attempt to admit that my analyses are also debatable, but will argue that they are less so than what exists in today’s literature.
Before I begin, though, I would like to thank all those who contributed in one way or another and that includes those who I have been lucky to have been able to call friends.
Without further ado, I propose the following methodology.
The idea of a logistic regression essentially seeks to calculate coefficients of variables and their associations to a particular event, based on econometric techniques of data analysis. Having gathered historical box score data of all NBA games played in the past 7 years (playoffs included), I decided to begin the analysis by associating a team’s particular stats in each game to whether they won or lost (a binary variable (or the event in question)). Upon doing so, the ingredients with which to perform this logistic regression were needed. I will not divulge the full variable list I settled on, but rest assured that none of my variables are outside the scope of what you will find on the box score. Basic variables like points, rebounds, steals and shooting efficiency statistics were used in addition to intuitive interactions between them. I will explain an example of an interaction term, in greater detail, as follows.
Upon first running team statistic variables on binary outcomes of games, an immediate determinant of winning that I noticed was a team’s field goal percentage. While this is definitely a heavy factor in predicting the outcome of a game, it says nothing about the amount of points scored altogether and can be further generalized to player contributions. For example, suppose Carmelo Anthony and Tyson Chandler both shoot 80% from the field, yet Carmelo scores 30 points versus Chandler only scoring 8. While both feats are impressive, one would generally like to associate a higher probability of the team winning to Anthony, based on these statistics alone. Similarly in a team sense, a typically low-scoring team, like the Chicago Bulls, may shoot a high percentage, but if they only score 60 points the entire game then this drastically reduces the effect of the high field goal percentage on the outcome of the game. Thus, an interaction term between amount scored and efficiency was required to distinguish the two feats.
Using my box score logistic results (the betas of the regression), I then calculated each individual player’s predicted values using their statistics in each of their games played across these seasons. The results were averaged across players through each season and sorted to determine who contributed most on average to a team winning. Logistic regression predicted values are easy to interpret in that they are interpreted as the player’s individual contribution to the probability the team wins (Example: Dwight Howard contributed 20% to the team winning today). However, not only were averages taken, but also different types of standard deviations for each player (I will explain shortly) in an attempt to measure their consistency in performances.
Armed with a player’s average and standard deviations in probabilistic contributions to winning, I then applied ideas in finance commonly known as the Sharpe and Sortino Ratio.
The Sharpe Ratio essentially divides average returns by standard deviations, while the Sortino Ratio does the same but divides by what is called “downside” risk, a distributional measure of below average performances. Because I felt a player should not be “punished” so to speak for performing above his average, I favored the Sortino Ratio in my analysis. To put the Sortino Ratio in perspective, it is commonly used to judge the favorability of holding stocks since you would like to invest in those that have a high average relative to their downside movements. The top-50 results of evaluating players as such are depicted below, and sorted, by the Sortino ratio.
(Chris’ graph had to be put into separate screenshots, instead one chart, due to formatting issues. So, I apologize for any inconvenience. Also, Chris added in 50 more players, just for fun, if it seems longer. – Evan)
As you can see, there are some intriguing results (and quite frankly the reason why I decided to incorporate standard deviations). From what it seems, both centers and forwards seem to have much higher average returns than wing players. At first, I questioned this result and sought to incorporate other factors that could perhaps reward guards a bit more favorably, but I eventually was unable to do so without biasing these results. As it turns out, arguments in favor of this being true turn out to be intuitive (or at least I think so). The first is that forwards and centers tend to shoot much higher field goal percentages than their guard counterparts. Because field goal percentages are so instrumental in predicting wins (according to the logistic regression) this partially explains the bias. Next, the logistic regression seems to also value rebounds and turnovers much more highly than assists in predicting outcomes. Consider this thought experiment if my arguments are not convincing enough.
Suppose you open up your SportsCenter or Yahoo Sports app to check the day’s box scores. Without looking at who won the game (essentially not looking at the points of each team), rather look at all the other statistics provided like rebounds, shooting percentages, assists, steals, turnovers, and so on. You may find that your best guesses with which to predict who won the game are mostly predicated upon shooting percentage, rebounds, turnovers, and possibly interactions between intuitive variables, like field goals attempted times shooting efficiency equals field goals made (the logistic regression supports the previous claims). Because turnovers were also very important in predicting wins, it follows that guards were penalized more so than forwards/centers, due to their typically higher turnover average, which is also a reason for the discrepancy. Perhaps this is my way of justifying my results, but it is no secret that basketball is a game where height advantages are key and thus plausible that bigs are more valuable especially those with a propensity to score, while maintaining a high efficiency (Dwight Howard in Orlando had the four highest average returns but also a very high downside deviation).
While many financial analysts use these ratios to decide how much to invest, we cannot really do the same, since currency in this study would parallel playing time.
I have yet to account for this, but an idea of mine is to actually look at players who typically play very few minutes and their Sortino Ratios to find “diamonds in the rough” so to speak. Gerald Green and Miles Plumlee in Indiana were two players for which I recall seeing high Sortino Ratios, despite their limited playing time (both are now key players on a resurgent Phoenix team). It is also wise to note that even though Sortino Ratios between players may be comparable, average returns are still needed to gauge who is an overall better player. Sortino Ratios only represent how often a player performs below his average but nothing about the average. One would obviously favor a player with a similar Sortino Ratio but higher average return. Also, and as mentioned before, comparing players by position is wise since the probability outcomes associated with a team winning partially favors big men.
Because this logistic regression incorporates many box score variables, those who tend to “stuff” the stat sheet seem to always have the higher probability contributions relative to others. I find that this is a desirable quality of gauging player value as under appreciated, all-around players like Nicolas Batum, are favored over the likes of more one-dimensionally dominant players, like Rudy Gay, in average return contributions. Unfortunately, box scores do not typically contain all the individual information relevant to a game’s outcome, like more robust defensive statistics (for example, shots contested), so the only way that defense has been captured is through defensive rebounds, blocks, and steals.
Due to the evaluation of players in the same way that stocks are analyzed, I hope to continue research on the topic of team building (or portfolio optimization) and also correcting many of the issues of this current method.
This is my first attempt at blogging and familiarizing my thoughts with the public, but I hope this has been (somewhat) insightful. I would like to thank all the great people at analyticsgame.com for allowing me to contribute and any comments or suggestions by you all would be greatly appreciated!