Data Quality Alert! Data Quality Alert!

There's a new piece over at Hockey Analytics which I heartily recommend to those interested in furthering the use of statistics related to NHL hockey. Alan Ryder pioneered the investigation of Shot Quality, which attempts to measure the characteristics of shots (distance, type, situation) to provide a more finely detailed view of offensive and defensive performance. I use a slightly simplified version of Alan's SQ techniques in my analysis here quite often, so when the article entitled "Product Recall Notice for Shot Quality" was posted, it definitely caught my eye. While it is obvious to anyone who has read through the NHL's play-by-play files that data quality problems exist, the presumption has been that these errors are basically random and cancel each other out over the course of 70,000+ shots in an NHL season.

By looking at arena-by-arena details, however, Alan has raised some pretty serious issues with the data, basically demonstrating that scorers in different venues seem to have systematic biases in how shots are recorded. Games played at Madison Square Garden, for example, consistenly have the most dangerous shots recorded in the logs, whereas scorers in Buffalo and Tampa tend towards the opposite view. The implications are that first of all, we always need to keep in mind the limitations of the data that the NHL presents to us, and secondly, look into possible means of correcting for such biases (by using something like the "park effect" that baseball stats junkies use). I guess I've got one more thing added to my summer to-do list...

Back in March I did something similar along the lines of the Giveaway/Takeaway stats, as well as how frequenty different scorers record Missed Shots vs. Saves. In my Give/Take and Missed Shot pieces, for example, I looked at how teams performed at home, how they performed on the road, and how visitors performed in their building, in order to isolate the effect of the official scorer. It was interesting to see that games in Chicago feature an absurdly low number of Giveaways and Takeaways by either team, while in Montreal or Edmonton the per-game figures are five times higher or more!

The potential for statistical analysis to extend our understanding of professional hockey remains largely untapped, but the quality of the data being recorded is a critical obstacle that needs to be overcome if we're to make the best progress we can. I'm not quite sure how best to pursue this issue with the NHL, but I'm open to suggestions.

X
Log In Sign Up

forgot?
Log In Sign Up

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

Join On the Forecheck

You must be a member of On the Forecheck to participate.

We have our own Community Guidelines at On the Forecheck. You should read them.

Join On the Forecheck

You must be a member of On the Forecheck to participate.

We have our own Community Guidelines at On the Forecheck. You should read them.

Spinner

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9355_tracker