Predicting All-NBA Teams

The NBA season is winding down, which means it is award season. Bloggers, podcasters, writers, and television personalities are all releasing who they think should make the First-, Second-, and Third-Team All-NBA squads. What I will do here is try to predict who will make the All-NBA squads. I won't try to predict each particular team, but only if they made one of the three squads or not.

Technical details

I first downloaded all of the individual statistics I could from Basketball-Reference.com for all players in all years from 1989 to 2016. I chose 1989 as the starting point, because it was the first year that the NBA voted on three All-NBA squads instead of two. These included all numbers from the Totals, Per Game, Per 36 Minutes, Per 100 Possessions, and Advanced tables on Basketball-Reference.com.

I trimmed the sample down a little bit by (a) cutting out anyone who was missing data (i.e., many big men did not have three-point percentages, because they did not take a three-point shot all season) and (b) removing noisy data by dropping anyone who played less than 500 minutes in the entire year. This is a cutoff many sites use (e.g., ESPN.com) for leaderboards, and it's a cutoff that I've used in research on the NBA before. This cut out 9 people of the 420 that made the All-NBA teams in my training data (years 1989 through 2013)—not a troubling amount of players. And given that the NBA has shot more and more three-point shots in recent years, I'm not worried about cutting a few big men from the training data who didn't attempt one in an entire season.

Which leads me to the particulars of training the model. I split up the data into four parts. First, the training data (seasons 1989 to 2013). I then tested the model on the three subsequent years separately: 2014, 2015, and 2016. These three subsets made up my testing data. The model classified players as either a “Yes” or “No” for making an All-NBA team. However, the model didn't know that I needed 15 players per year, consisting of 6 guards, 6 forwards, and 3 centers. So to assess model performance, I sorted the predicted probability, based on the model, that a player made the All-NBA team. I took the 6 guards, 6 forwards, and 3 centers with the highest probabilities of making the team and counted them as my predictions for the year.

I tried a number of algorithms: k-NN, C5.0, random forest, and neural networks. I played around with each of these models, changing the number of neighbors or hidden nodes or trees, etc. The random forest and neural networks yielded the same accuracy; I find the random forest to be more interpretable, so I went with that as my model. The default 500 trees and √ p  variables, where p is the number of variables we are using to predict if the player made the All-NBA team (in our case, 10 variables per tree), were just as accurate as any other alternative I tried, so I stuck with the defaults. So let's now turn to how accurate the model was at predicting seasons 2014-2016.

Is the model accurate at predicting years 2014-2016?

Using the criteria above, I predicted 10 of the 15 All-NBA players in 2014, 13 in 2015, and 12 in 2016 correctly for an overall accuracy of about 78%. A lot of people in the NBA world have maligned the fact that the All-NBA teams must include 2 guards, 2 forwards, and 1 center per squad. What if these restrictions were lifted? Which players did the model predict, based solely on their performance, to make the All-NBA team? Let's look year-by-year.

Model-predicted All-NBA players for 2014

Player Position Made Team?
Kevin Durant SF Yes
LeBron James PF Yes
Blake Griffin PF Yes
Kevin Love PF Yes
Carmelo Anthony PF No
James Harden SG Yes
Stephen Curry PG Yes
DeMarcus Cousins C No
Chris Paul PG Yes
Anthony Davis C No
Dirk Nowitzki PF No
Paul George SF Yes
Dwight Howard C Yes
Joakim Noah C Yes
Goran Dragic SG Yes

Model-predicted All-NBA players for 2015

Player Position Made Team?
Russell Westbrook PG Yes
LeBron James SF Yes
James Harden SG Yes
Anthony Davis PF Yes
Chris Paul PG Yes
Stephen Curry PG Yes
DeMarcus Cousins C Yes
Blake Griffin PF Yes
Pau Gasol PF Yes
Damian Lillard PG No
DeAndre Jordan C Yes
LaMarcus Aldridge PF Yes
Kyrie Irving PG Yes
Marc Gasol C Yes
Jimmy Butler SG No

Model-predicted All-NBA players for 2016

Player Position Made Team?
LeBron James SF Yes
Kevin Durant SF Yes
Russell Westbrook PG Yes
Chris Paul PG Yes
James Harden SG No
Stephen Curry PG Yes
Kawhi Leonard SF Yes
DeMarcus Cousins C Yes
Kyle Lowry PG Yes
Damian Lillard PG Yes
DeAndre Jordan C Yes
Isaiah Thomas PG No
Anthony Davis C No
Paul George SF Yes
DeMar DeRozan SG No

Most notably here is James Harden had the 5th highest probability of making the All-NBA team last year—the model gave him a probability of .82. This supports the obvious point that he was one of the biggest snubs for the All-NBA team in the history of the league. The model also tended to classify some players who weren't on very good teams. I didn't include their team's win-loss record in the input variables for the model, because I was not sure how to go about calculating this win-loss percentage for players who were traded mid-season. However, the win-loss record was indirectly included in the win shares for a player.

What were the most important statistics in predicting if people made the All-NBA team or not? We can see this by looking at the Gini index. I looked at the distribution of the Gini index for each predictor variable, and two were much higher than the rest: Player Efficiency Rating (PER) and win shares. These two metrics are popular choices as indicators of overall performance, and it seems like they at least have some predictive validity when it comes to which players people will select for the All-NBA squads.

Predicting this season

Top 25 probabilities for making the All-NBA team

Player Position Probability
LeBron James SF 0.966
Kevin Durant SF 0.928
Kawhi Leonard SF 0.918
Giannis Antetokounmpo SF 0.854
Stephen Curry PG 0.800
Karl-Anthony Towns C 0.778
Jimmy Butler SF 0.762
Russell Westbrook PG 0.748
DeMarcus Cousins C 0.746
Anthony Davis C 0.742
John Wall PG 0.742
Gordon Hayward SF 0.740
DeAndre Jordan C 0.736
DeMar DeRozan SG 0.732
Marc Gasol C 0.716
Rudy Gobert C 0.688
Kyrie Irving PG 0.678
Dwight Howard C 0.662
Blake Griffin PF 0.642
Kyle Lowry PG 0.638
Isaiah Thomas PG 0.632
Andre Drummond C 0.618
James Harden PG 0.604
Kemba Walker PG 0.598
Chris Paul PG 0.574

The top 25 probabilities for making the team this season are listed above, as well as the players' positions. Using the position rules that the NBA requires for the All-NBA teams (discussed above), my predictions for the All-NBA teams, with just under ¾ of the NBA season complete, are:

Forwards

  • LeBron James
  • Kevin Durant
  • Kawhi Leonard
  • Giannis Antetokounmpo
  • Jimmy Butler
  • Gordon Hayward

Guards

  • Stephen Curry
  • Russell Westbrook
  • John Wall
  • DeMar DeRozan
  • Kyrie Irving
  • Kyle Lowry

Centers

  • Karl-Anthony Towns
  • DeMarcus Cousins
  • Anthony Davis