x

Already member? Login first!

Comments / New

Taking a Shot: Is it Normal?

Is there anything a coach or a player can do to improve the chance of a goal, or is it just luck from the hockey gods?

This is the million dollar question, but let’s try to answer it. My ultimate goal is to build a model that predicts hockey futures (4-year Cup run for the Preds?), but you can’t build a model unless you understand the data you’re modelling. You’ve got to see it, spin it around, and check behind its ears to see if it washed. To do this, I’m going to explore shot data with you in public. As always in this series, the focus is on shot attempts, including blocks, misses, and stops.

What does shot attempt data look like for players?

Figure 1 is a plot of every player’s Corsi per 60 (if they shot at least 50 times, only played on one team, and had at least 500 minutes of time on ice). The X-axis is every player numbered from best to worst, 1-499. The Y-axis is their C60. The blue line is the real data.

Figure 1. C60 across the league sorted by C60. Blue solid line is real data.

Pretty, but so what?

To appreciate this graph and what it says, we need to imagine what it could say instead. If every player in the league shot the exact same number of shots, then the graph would be a straight line from left to right (Figure 1; black dotted line). Figure 1’s blue line doesn’t look like this, which means that some players shoot more than others.

“Brilliant. Never knew that, Paca.”

Glad I could bring this insight.

Time to imagine more, though. If some players shoot more than others, then it might be that each player is a little better than another player. That would look like the orange dotted line.

This also doesn’t happen — but it does a little. Most of the curve in Figure 1 is just a straight line. But around the edges, that’s no longer the case. In particular, in the top section, the line starts sloping up faster than it does almost everywhere else. Figure 2 has the real data again plus a line approximation (orange dotted line). The line approximation works pretty well, but starts being wrong on the lefthand side.

Figure 2. C60 real data and a line approximation.

Okay, they are different, Paca, but they don’t look super different, really, do they?

They do if you zoom in more. Figure 3 shows the pattern for the top 100 players. Figure 4 does the same for the middle 100 players. Figure 3 has a curve; Figure 4 is straight.

Figure 3. C60 for the top 100 players across the league

Figure 4. C60 for the middle 100 players across the league

What does this mean? It means that your most prolific shooters are in a different league, shot-wise, than your average player. We can build an argument off of this fact. But before doing that, let’s just look at this a bit more. Does the pattern we just identified always happen?

The next set of plots looks at Block Proportion, Miss Proportion, and Stop Proportion (the proportion of shot attempts which meet each fate), again sorted best to worst. Since you don’t want your shot blocked, etc., low numbers are good.

Figure 5. Block proportion across the league

Figure 6. Miss proportion across the league

Figure 7. Stop proportion across the league

Graphs! Graphs! Everywhere!

The point of tossing all of these plots at you is to notice that they all look kind of the same. The blocks, misses, and stops are flipped upside down compared to Corsi Per 60, but they all have a main line with these little curves at the end. (Except for the bad side of stops. There are some players where all of their shots on goal were stopped. This is because there’s a hard ceiling for stops that doesn’t exist for the others filters.)

We’re taking a look at this data, because we’re looking for something. We’re looking for the hockey gods. What does a hockey god look like?

Okay, okay, we know the answer already.

Seriously, though, y’all, if you want a break from numbers to read a hilarious appreciation of our man Josi, check out this write-up from “what’s up ya sieve” and Foxy Friday.

Other than Roman Josi, what does a hockey god look like? I’m talking about luck. It looks like this.

Figure 8. Example normal distribution from math stack exchange.

Yes, a hockey god is a bell curve, also called a normal distribution. When you count up the chance of things happening and it takes the shape of a bell, it is “normal”. When something is normal statistically, the most likely option is the top of the curve. That’s your average (mean, or expected value). The other parameter is how wide the bell is: that’s your variance or standard deviation.

Luck, then, would look like a normal distribution around some average and with some variation. It’s balanced equally on both sides. The average C60 is about 11, so if a demon ties you to the rack and says, “guess the 5v5 C60 of this unnamed hockey player,” (which happens to me a LOT), choose 11. Any other choice is less likely than 11.

Put differently, if the variation among hockey players looks like a normal distribution, then you have no evidence the variation is anything other than luck.

That last is the money sentence.

Now, we can start addressing our question about hockey gods. The question, “Is this shot number just luck?” becomes “Does the probability distribution for this shot number look normal (like a bell curve)?”

In fact, we can guess that the luck part of shots in hockey should look normal without even checking the data. This is because taking a shot is like flipping a coin. It’s either blocked or not, and it misses or not. It’s not a fair coin, because it’s not 50/50, but it’s yes or no. And when you count up large numbers of coin flips, you get a normal distribution. (This rule is what’s know as the Central Limit Theorem.) So, for this data, the luck part should look like a normal distribution. Deviations from normality are… not luck.

All of my plots above are sorted from best player to worst player. If I just grab random numbers out of a normal distribution, what does it look like when you sort it best to worst (orange line, Figure 9)?

Figure 9. A normal distribution sorted from best to worst.

Well. THAT looks a lot like our plots so far. So we can suspect that all the shooting data is from a normal distribution. However, the devil is in the details. This normal discovery does mean that the little tails up and down at the ends can be entirely explained by luck, not skill or system. In a bell curve, there are going to be exceptional items on the sides. But the curves otherwise aren’t quite as perfect as Figure 9.

Let’s stop beating around the bush. If we really want to see if something is a bell curve, we can plot it like a bell curve and take a look (Figure 10; this is known as a probability density function).

Figure 10. C60 Bell Curve (Probability Density Function).

Corsi per 60 does look kinda bell curvy, i.e. kinda normal, but the right side is wonky. That’s the side of the best players. I ran a test of normality on the data (Shapiro-Wilk), and it’s not normal. This means there’s something going on that cannot be explained by randomly sampling around the mean.

How about the block proportion?

Figure 11. Block proportion probability density function

Again, the left-hand side is beautiful, but that right hand side! There’s a whole second lump coming out. If I had to guess, I’d say we are seeing two overlapping curves, one for offense and one for defense, because the blueliners shoot from… wait for it… the blue line. But I don’t really know yet. Whatever it is, it’s not normal.

Figure 12. Miss proportion probability density function

Figure 12… looks pretty normal. It’s not perfect, but there are no completely obvious lumps or skews, and indeed the normality tests say it’s normal. (Technically, it says it’s not clear that it isn’t normal.)

This suggests that we could explain all of the variation we see among players for misses just by luck. The data is consistent with an entirely hockey god explanation. If this holds out, then it also suggests that we should see regression to the mean on misses pretty commonly. If someone is scoring a lot of goals because they aren’t missing, then they’ll probably miss more next year. If they are among the worst at missing, then they’ll likely do better next year. (I can hear Craig Smith cheering from here.) This may not be the case, but the shape of the data requires no more explanation.

Next up, the chance of the goalie stopping a shot.

Figure 13. Stop Proportion density function.

This time,  the right hand side is nice and smooth, looking normal, but the left hand side is tilted. In formal testing, it’s also not normal. We can’t explain variation in players Shot % entirely by random luck.

Indeed, when you look at the players with terrific Stop Proportions, it’s often players who are known as goal scorers. It’s T.J. Oshie, Jake Guentzel, Rikard Rakell, Artem Anisimov, Evgeni Malkin, Patrik Laine, etc. When you look at the players with terrific Miss Proportions, you see some good players. Patrik Hornqvist was 18th in the league in C60 and 5th in Miss Proportion. There’s also tons of people ranking in 200s, 300s, and 400s in C60 right at the top of Miss Proportion. People who only scored 3, 6, 11 goals. Sidney Crosby is right next to Wideman in Miss Proportion, where Crosby put in 26 Goals at 5v5, while Wideman only put in 2.

To summarize, what we see is:

  1. Corsi per 60 and Stops are non-normal for the good players.
  2. Blocks are non-normal on the bad side (this is probably the defense shooting).
  3. Misses so far can be explained entirely through luck.

The data for Corsi, Blocks, and Shooting requires us to tell more of a story than “hockey gods.” Misses may not be all luck, but we don’t need any story other than luck to explain the data we have so far. If so, don’t depend on misses or despair about them when predicting how a player will do in the future. Chances are that will turn around.

To finish this up, we want to know if any of it relates to goals. If you know a player’s C60, can you predict they will score more goals? How about their block proportion? Here’s the answer:

Corsi: Yes

Blocks: No

Misses: Yes

Stops: Yes

But for the evidence… next post.