This is a follow-up update to the previous piece here.
Approximately three weeks after the issue was initially discovered, the shot location data in the NHL’s play-by-play listings have been corrected. Shortly after the discovery was made by Josh and Luke Younggren of Evolving-Hockey.com, the NHL issued a statement that the transition to new analytics tools this season were likely the source of the issue, as also reported in this edition of 31 Thoughts by Elliotte Friedman.
Last night, the Younggrens announced that the majority of the first 91 games had been fixed:
Announcement!!! It appears that we now have, for the most part, completely updated event coordinates from the NHL! We've been tracking this since we found out about it, and as of 10:30pm CST, it looks like all but games 41-52 and possibly game 17 have been updated!— EvolvingWild (@EvolvingWild) November 7, 2019
I'm not entirely sure why these were not updated, but they haven't changed. However, if we remove these games, the difference becomes clear. Using the same histogram from several weeks ago, we can see how much the distances have changed: pic.twitter.com/oviLCIU3PJ— EvolvingWild (@EvolvingWild) November 7, 2019
Alright, I'm not deleting this, but there are some differences that initially made both of us take pause. I'm by no means a heatmap expert (I assume the expert will be looking into this tomorrow). The first plot is the old coordinates, the second plot shows the new coordinates. pic.twitter.com/4mqpWfM6KP— EvolvingWild (@EvolvingWild) November 7, 2019
Early this afternoon, Micah Blake McCurdy confirmed that the last remaining games were corrected as well.
So, as of about an hour ago, the shot locations for this year are finally fixed. It took the league a little more than three weeks from when they were first notified about the problem, which was discovered by outsiders.— Micah Blake McCurdy (@IneffectiveMath) November 7, 2019
(Note: I love Dr. McCurdy’s use of the phrase “discovered by outsiders”—it really gives the whole thing a post-apocalyptic vibe, which is fitting, since for us stats types it might as well have been the end of the world.)
While most statistics such as shot attempts, goals, etc. were unaffected, these issues had major effects on metrics such as Expected Goals (the probability of an unblocked shot becoming a goal due to factors such as location) and High-Danger Chances (another metric of goal probability, sorting shots into categories based on location). I have updated the graphic from my last piece to show the difference these fixes have on these metrics.
Keep in mind that the sample size—about 5-7 games per team—is small, and the overall effect may not be immediately apparent. However, for expected goals, one can usually expect a pretty close correlation between expected and actual goals scored over the course of a season, as seen in the top row of the graphic.
Interestingly, even with the fixes, nearly all models seem to show expected goals lagging behind actual goals this season—as noted by Brad Timmins, who runs Natural Stat Trick—and as we continue through the season it will be interesting to discover if this trend holds. The difference between the incorrect data (0.53 goals over expected on average) and the now corrected data (0.29 goals over expected on average) is still significant.
For those curious, my model is somewhat conservative, but it's lagging actual goals more this season than in previous years (even when just looking at the start of seasons).— Natural Stat Trick (@NatStatTrick) November 7, 2019
Be sure to follow along here at On The Forecheck as I’ll be reviewing the analytics behind the Predators (and the rest of the NHL) now that enough games have been played and the data is accurate.
Because, even though the people who discovered this issue are actual professionals, this tweet does expose my state of being when I wrote the previous article:
so if i have this right, the people the NHL pays to "watch the game" were making mistakes that were discovered by bloggers wearing underpants in the basement https://t.co/TjBWJ7VQj8— nobody (@petbugs13) November 7, 2019
I’ll have you know it’s my own house, and we don’t have a basement, thank you.