I have been meaning to learn more about time-series and Bayesian methods; I'm pumped for a Bayesian class that I'll be in this coming semester. RStudio blogged about the
CausalImpact package back in April—a Bayesian time-series package from folks at Google—and I've been meaning to play around with it ever since. There's a great talk posted on YouTube that is a very intuitive description of thinking about causal impact in terms of counterfactuals and the
CausalImpact package itself. I decided I would use it to put some common wisdom to the test: Do NBA teams get better after getting rid of Rudy Gay? I remember a lot of chatter on podcasts and on NBA Twitter after he was traded from both the Grizzlies and the Raptors.
I went back to the well and scraped Basketball-Reference using the
rvest package. Looking at the teams that traded Gay mid-season, I fetched all the data from the “Schedule & Results” page and from that I calculated a point differential for every game: Positive numbers meant the team with Rudy Gay won the game by that many points, while negative numbers meant they lost by that many points. I ran the
CausalImpact model with no covariates or anything: I just looked at point differential over time. I did this separately for the Grizzlies 2012-2013 season and the Raptors 2013-2014 season (both teams traded Rudy mid-season). The pre-treatment sections are before the team traded Gay; the post-treatment sections are after the team traded Gay.
The package is pretty nice. The output is easy to read and interpret, and they even include little write-ups for you if you specify
summary(model, "report"), where
model is the name of the model you fit with the
CausalImpact function. Let's take a look at the Grizzlies first.
|Actual||Predicted||Difference||95% LB||95% UB|
The table shows the average and cumulative point differentials. On average, the Grizzlies scored 4.4 points more than their opponent per game after Rudy Gay was traded. Based on what the model learned from when Gay was on the team, we would have predicted this to be 3.6. Their total point differential was 167 after Rudy Gay was traded, when we would have expected about 136. The table also shows the differences: 0.82 and 31.22 points for average and cumulative, respectively. The lower bound and upper bound at a 95% confidence interval fell on far opposite sides of zero, suggesting that the difference is not likely to be different from zero. The posterior probability here of a causal effect (i.e., the probability that this increase was due to Gay leaving the team) is 61%—not a very compelling number. The report generated from the package is rather frequentist—it uses classical null hypothesis significance testing language, saying the model “would generally not be considered statistically significant” with a p-value of 0.387. Interesting.
What I really dig about this package are the plots it gives you. This package is based on the idea that it models a counterfactual: What would the team have done had Rudy Gay not been traded? It then compares this predicted counterfactual to what actually happened. Let's look at the plots:
The top figure shows a horizontal dotted line, which is what is predicted given what we know about the team before Gay was traded. I haven't specified any seasonal trends or other predictors, so this line is flat. The black line is what is actually happened. The vertical dotted line is where Rudy Gay was traded. The middle figure shows the difference between predicted and observed. We can see that there is no reliable difference between the two after the Gay trade. Lastly, the bottom figure shows the cumulative difference (that is, adding up all of the differences between observed and predicted over time). Again, this is hovering around zero, showing us that there was really no difference in the Grizzlies point differential that actually occurred and what we predicted would have happened had Gay not been traded (i.e., the counterfactual). What about the Raptors?
The Raptors unloaded Gay to the Kings the very next season. Let's take a look at the same table and plot for the Raptors and trading Rudy:
|Actual||Predicted||Difference||95% LB||95% UB|
The posterior probability of a causal effect here was 95.33%—something that is much more likely than the Grizzlies example. The effect was more than five times bigger than it was for Memphis: There was a difference of 4.8 points per game (or 302 cumulatively) between what we observed and what we would have expected had the Raptors never traded Gay. Given that this effect was one (at the time, above average) player leaving a team is pretty interesting. I'm sure any team would be happy with getting almost 5 whole points better per game after getting rid of a big salary.
It looks like trading Rudy Gay likely had no effect on the Grizzlies, but it does seem that getting rid of him had a positive effect on the Raptors. The
CausalImpact package is very user-friendly, and there are many good materials out there for understanding and interpreting the model and what's going on underneath the hood. Most of the examples I have seen are simulated data or data which are easily interpretable, so it was good practice seeing what a real, noisy dataset actually looks like.