Update, 10:18 PM CT: Greg Wyshinski of ESPN provided this update at 8:15 PM
Some news regarding that hockey analytics problem discovered by @EvolvingWild and @IneffectiveMath and others. https://t.co/OmnkwgzdiO— Greg Wyshynski (@wyshynski) October 16, 2019
That’s right folks - less than 24 hours after Josh and Luke first discussed what they found, the NHL is actually looking into it! We’ll keep you updated if the league provides more information.
With the 2019-2020 NHL season nearly three weeks old, amateur and professional statisticians, armchair analysts, and visualization creators have been hard at work providing deeper insight intothe hockey games we watch each night. The large majority of the publicly-available data and statistics all are derived from the NHL’s own Application Programming Interface (API). Before you yawn and dismiss it for techno-babble, the NHL API is also used to feed websites like NHL.com, ESPN, TSN, and most fantasy hockey sites. It contains a massive amount of information, that, once sorted through, can give you anything from number of hits for a player to the framework for advanced statistics like Goals Above Replacement (GAR).
Two of the most widely-known and respected people in the analytics community are Josh and Luke Younggren, who are on Twitter as @EvolvingWild and known for their website Evolving Hockey. They host a treasure trove of stats on their website, as well as being one of the few publicly-available sources for statistics like Expected Goals (xG)—a measure of the probability that any missed shot, shot on goal or goal had of scoring. Calculating xG takes in many, many factors (those interested can read more here), but one of the most important is shot location. The NHL provides this information in its play-by-play data in the form of X and Y coordinates—making it simple to calculate shot distance and angle, among other things.
However, Monday evening Josh and Luke posted to Twitter a peculiar finding:
Yeah, so it looks like there is something very different about how the NHL is recording event location coordinates this season... and, umm, it's not great. Thread incoming— EvolvingWild (@EvolvingWild) October 15, 2019
They went on to explain that shot location data was different than what we have seen in years past. Shot locations in the data didn’t line up with what we saw on video in some cases, but there was no exact reason why.
This Johansen goal was recorded 9 ft from the net, 35.5% xG. These were the two highest xG events this season. We updated our model, which meant much lower xG values overall, but we were expecting a ceiling of ~70%, not 40%. pic.twitter.com/7uGx6CRd1N— EvolvingWild (@EvolvingWild) October 15, 2019
As Nashville fans are likely aware, Ryan Johansen started the furious rally against Washington last week scoring a goal where he poked a loose puck that was sitting almost directly underneath Braden Holtby’s pads—yet the NHL recorded it as a shot occuring from nine feet out. Shots like that and the one above would usually have an expected goal probability between 60-70%, but registered only in the high 30’s.
Plotting out shots on a map, as well as looking at the distribution of distances, Josh and Luke discovered some oddities:
And if we look at the actual locations of Fenwick shots on the rink within 30 feet, it's clear they've moved further away as well: pic.twitter.com/hFa71IvKPj— EvolvingWild (@EvolvingWild) October 15, 2019
It is almost as if this season there was a force-field around the goalie crease (only one shot shown fully in the crease), which, again, as we saw from Johansen’s goal, is not true.
Hockey Twitter (well, stats Hockey Twitter) was abuzz—confused as to what was going on, but realizing why some of our tried-and-true analyses were producing baffling results. Most expected goal models that are available update their models each year, so that as a rule of thumb, the total xG compiled by a team (the sum of all shot probabilities) should be pretty close to the number of actual goals scored. We’ll refer to the difference between Goals For (GF) and Expected Goals For (xGF) as Goal Differential, and as you can see below, things have drastically changed:
For the average difference to jump from about 0.04 more Goals Scored over expected last season to 0.55 this season (since we obviously don’t have as many games, I used rate per 60 to even things out) is a major warning sign.
I did a quick experiment of my own, and hand-tracked all the shots shown in the highlight video from Nashville’s season-opening win against Minnesota:
So just as a quick exercise after seeing @EvolvingWild's thread, i wanted to check something quick and easy. I pulled up https://t.co/aZPx0CgVkG this highlight video, and manually put a shot location for every fenwick event in the video. Here's what I saw: pic.twitter.com/TRYG8xWrCa— Bryan (@projpatsummitt) October 15, 2019
On average, shots were being recorded almost two and half feet further away then where they were actually taken. Ryan Ellis, for example, shot on goal with his back basically ON the wall, yet the NHL data said he was over 10 feet further away from it.
Others have noticed oddities over the past few weeks too. Matt Donders (@mattdonders on Twitter) runs @NJDevilsGameBot and @shotmaps—the latter an account that produces shotmaps automatically for all NHL games at each intermission. Several people had pointed out that fairly obvious goals just weren’t in the right spot at all on the shot maps.
Update: Micah Blake McCurdy of HockeyViz.com confirmed suspicions using his MAGNUS 2 shot mapping model this morning:
Just to corroborate what @EvolvingWild found in that thread, here's the change from 2018-2019 to 2019-2020, as a rate. Taken as a whole, the league is recording shot locations differently this season. pic.twitter.com/NkDMq3iJQc— Micah Blake McCurdy (@IneffectiveMath) October 15, 2019
You might ask, why does this matter? Well, pretty much any statistic available to the public outside of shots, goals and assists relies on shot location data—from shot maps and xG, to high-danger chances for and fancier visualizations illustrating offensive efficiency, among others. Evolving-Hockey, HockeyViz, NaturalStatTrick, MoneyPuck and even the game charts posted by yours truly all utilize this data, and there’s reason to believe more than a handful of NHL teams use public data as well. It explains why many goalies this season look so horrid in games where they gave up four goals, but had expected goals against of only 1.3—did they really give up nearly three more goals than they should have (this was a topic I was going to write on, discussing the early struggles of Nashville’s goaltending)? We can tell they had struggled, but the numbers seemed to indicate that goalies couldn’t save low-percentage shots, and skaters weren’t creating quality with their shots—all because for some reason shot locations were being recorded differently.
As of this morning, the reason for this change hasn’t been fully explained—although some have their suspicions—but all that we can do until either the NHL addresses it or all the smart people make adjustments is wait. And even if you don’t care about analytics or advanced statistics, I imagine most people would like to have accurate shot maps as well.
(Thanks to Josh and Luke Younggren for discovering this issue and looking into it, as well as various other members of the community providing insight. We hockey nerds have to stick together.)