• Football
• Baseball
• Hockey
• College
• General

# Stat Exploration: Modeling Field Goal Percentage

With the release of player shotlogs, providing a variety of information on every single shot during the season, there’s a lot of potential for studying field goals. One area of interest is how the proximity of the closest defender affects the conversion rate of the shot. Then, building on some insights there, how can we model field goal percentage if we know things like shot distance and the number of dribbles taken?

Inside shots (within four feet)

Based on my previous research, there are four basic areas of the court. The inside region, which is basically the restricted area, has a flat field-goal percentage that doesn’t change with distance. Then there’s the weird “in-between” region from 4 to 10 feet where the conversion rate drops considerably and there’s a large diversity of shot types like hooks and floaters. Then there’s the midrange region from 10 feet to the three-point line that is almost entirely jump shots and where the field-goal percentage is pretty steady but still decreases. And, of course, there are three-pointers, but I only focus on ones under 30 feet because beyond that the percentages get sketchy for everyone, save for Gilbert Arenas.

Inside shots, of course, are often closely guarded. The most common distance to the closest defender is between 1 to 2 feet. It’s also fairly rare for an inside shot to occur without a defender within six feet. The easiest way to get an uncontested shot at the rim is to beat everyone in transition.

The graph below has a few useful features and it’s a near-perfect illustration of an asymptote. The dots are real data and the line is from a model made by a regression to reflect how field-goal percentage changes. The shading indicates the number of attempts from a specific distance. Thus, don’t focus too much on all the stray white dots far from the line. I used a logit function to fit the line — basically, it’s a useful curve that hits a limit. The limit here, using the nonlinear package in R, was 97.1%, meaning based on the data given if an NBA player is completely open within four feet he will miss only three times out of a hundred. (This is a simple exercise, however, and it’s not adjusted for who’s taking it or if it was a layup at full speed or a standstill dunk.)

One key implication is how important contesting a shot is. Obviously, an open shot at the rim is the worst shot you can give up, but when a player is fairly close even one foot makes a huge difference — and it’s not just the model suggesting that; it’s the real data. When a shot is contested near the rim, being one foot closer is worth nearly 10 percentage points. That’s the kind of difference you see between the best defense and the worst defense, or an electric dunker like Griffin and an average player.

In-between shots (four to ten feet)

In that awkward zone around the rim outside of the restricted area, populated by post-players and slashers shooting over rim protectors, defenders are generally a bit further away, which is a trend you see with distance from every zone. The vast majority of these shots happen when a defender is an arm length or two away, and it’s quite rare for a player to take an open shot in this range — if you’re that open, why not get even closer?

The in-between shots have a different pattern than the previous section. There’s no steep, exponential climb, and tightly contested shots are inefficient. There was a problem, however, in establishing the limit. The model suggested 100% for completely open shots, which is obviously improbable. There might be an issue fitting the data here because it’s hard lumping in all these different shot types together and the only very open shots in the dataset are probably layups at 4 or 5 feet, so the regression model assumes being open is even better than in reality. This is why we need to control for other factors.

Midrange shots (ten feet to three-point line)

Defender distance is completely changed when you get to midrange shots. The average increases to over four feet and it’s rare when a defender is within a foot. There are also a handful of shots taken with ten feet or more of space, which is like having the entire wing to yourself.

Referring to the plot below, the curve can even be seen with the data points. The limit here is 46.4%, which is basically the percentage an NBA player is expected to hit when completely open. Since curve is nonlinear, the rate of change of how much a defender affects the shot is based on how far away the defender is. There’s a 5% change for every foot when the defender is very close, but a one foot change when you’re very open makes only a tiny difference.

Three-point shots (under 30 feet)

The histogram for three-pointers is similar to midrange shots, but the left side is steeper and it’s shifted over a bit. The average defender distance is about five feet and this type of shot is much more likely to be wide open than others. I had to expand the histogram range to capture everything significant and it still probably wasn’t enough.

As seen below, three-pointers are extremely sensitive to tight defense. These long-range shots are known as efficient options teams should use more, but that’s not true is the defender is right in the shooter’s face. For an example of the appropriate trade-off, the break-even point for open midrange two-pointers and the defender distance for three’s is about three feet — meaning, an open midrange shot is about as efficient as a three-point shot where the defender is three feet away. This could speak to the limits of a Morey-ball offense with an absurdly high amount of three-pointers. The limit here, by the way, is 41.9%.

Field-goal percentage model

If you were wondering how I created those lines in the previous graph, it was with a logistic function with a couple of tweaks using the nonlinear regression function the program R. For those unfamiliar, it’s an S-shaped curve that behaves like a straight line for the middle portion and then softens its slope to horizontal lines near the extremes. This is ideal for modeling a percentage.

For the best possible results, I’ve lumped in every shot from 10 to 30 feet filtering out attempts when the shot clock is below two seconds and any miscoded shot like a two-pointer beyond 24 feet or a three below 22 feet. The basic structure is shown below. What’s important to take away from this is that the part on top is showing the upper limit — basically what you’d shoot when completely open. The part on the bottom adjusts for a couple other factors: how close the nearest defender is and whether or not it’s a catch-and-shoot shot without a dribble

( OpenFG% + A2 * ShotDist )
[1 + exp( -( B1 * DefDist + B4 * C&S ) )]

Other variables like the number of dribbles and how long you had the ball before shooting were statistically significant but not overwhelmingly so and the results were not logical. With two variables in the denominator, the residual error was almost exactly the same, so I went with the more stable and simpler model. Also, I think the important factor is how fast you were going before stopping to shoot or how off balance you were — that’s the problem with a lot of pull-up shots. The results are below, and you can use the values and plug in whatever you want to mess around with it. Just note that this includes every single player and it should only be used for shots from 10 to 30 feet. (One missing feature is that every player has a range limit where FG% drops off quickly past a certain point. This might be a future piece.)

( 65.7 – 0.953 * ShotDist )
[1 + exp( -( 0.324 * DefDist + 0.349 * C&S ) )]

One interesting finding is that whether or not a shot is off the dribble has only a small effect on the percentage of the shot. The different is about 3%, depending on the other factors, when it’s tightly contested, but when open this drops to a half a percent. Also, when you control for these other factors, it makes no difference if a shot is a three-pointer — players even today aren’t “better” at these shots than midrange jumpers; they’re usually just more open.

If you want to check to see how well the model fits the real world results, I’ve provided a graph below with the same shot data. Just note, of course, how the points furthest away from the fitted line have the fewest number of attempts.

But there’s a problem with this method. Aren’t the best shooters more tightly contested than everyone else? To adjust for this, I added a set of variables in the numerator next to the “open shot FG%” for the 53 players with at least 500 attempts. One fun side effect is that this is a new way to rank jump shooters because important factors like distance and defense are corrected for.

The coefficients, oddly enough, hardly changed (there’s a higher open FG% but the shot distance penalty coefficient is bigger too):

( 67.6 – 1.05 * ShotDist )
[1 + exp( -( 0.273 * DefDist + 0.349 * C&S ) )]

And who’s the highest ranked shooter adjusting for these factors with at least 500 attempts?

Kyle Korver outranks even Stephen Curry with a FG% that’s 15% higher than the league average, accounting for the aforementioned factors. Dirk, Calderon, and Durant are known for their shooting, but Afflalo and Carmelo are probably surprising. Carmelo subsists on a lot of tough contested midrange shots that are typically harder to convert and he shot 40% from behind the line. Afflalo had a good shooting year, and he also relied on a lot of contested midrange shots. Middleton is probably the most surprising, but his jump shot numbers are outstanding and if you don’t believe me check out his page on the reliable basketball-reference.

Near the bottom of the list, Aldridge is probably one of the first names people think of for shooters who look better when you account for how contested their shots were. It’s a bit disappointing how low he’s ranked, but his shooting percentages weren’t great and he still rated as 3.6% better than the league average. LeBron’s in the same boat, and like a lot of the athletic players at the bottom his best skill is probably getting open, not pure shooting. Then there’s poor Josh Smith who once again ranks as the worst shooter.

Once I have some more data to work with from this season, I can be more confident in looking at other factors like the number of dribbles and add more players to the adjusted FG% list. (Since I’m using nonlinear regression, it takes a while to run so I was reticent at adding any more players right now.) But these are some hard, usable results from real world data on how field-goal percentage changes and what changes it. We have a better understanding of what goes into a made shot and how important a contest is. For example, today teams are concerned about defending three-point shots, but based on the data part of the problem is how open these outside shots are. You can neutralize, or at least blunt, those attacks from behind the arc with the right defense. With an appropriate set of tools, we can explore these deep databases like SportVU’s and find key insights. Now just remember this public data is just the tip of the iceberg and there’s so much more possible.