Guest Post: NHL Player Clustering and Regression – An Algorithm to Value Players

For those interested in the application of advanced statistics to sports the annual MIT Sloan Sports Analytics Conference is as big as it gets. In four years it’s grown from 175 attendees in 2007 to 1500 attending the 2011 edition with a further 300 on a waitlist. Former MIT professor and current Houston Rockets GM Daryl Morey is a conference chair, and in it’s short existence Mark Cuban, Chris Nowinski, Peter Chiarelli, Jonah Keri, Nate Silver, and Bill James have appeared to speak among many others. In part this event has solidifed Boston’s reputation as the “Silicon Valley of sports analytics“.

Long story short, last March I took notice of a post on the Sens forum at HockeysFuture by a student in Industrial Engineering who attended the 2011 conference. It’s an interesting little write up recounting meeting various hockey people (Stan Bowman, Brian Burke) and the general consensus of where hockey analytics is and where it’s going. In between working towards his degree at U of T, he remarked of trying to get into the field someday having done some analytical work himself. A few weeks ago I asked him if he’d be interested in posting any of his work here sometime this season, be it Sens related or league-wide. After getting the go ahead from his co-authors he agreed.

David’s paper is below, it’s a slightly different way of valuing players than a system such as goals versus threshhold but has similar characteristics to point shares as discussed here. I know it helped me in understanding the value of eash cluster by thinking of it as follows: “if the average top line/2nd line/top D/etc played every minute all 82 games they’d be worth X points in the standings”. I’m sure David will be around if you have any questions just leave them in the comments.

Thanks to David and his co-authors for allowing us to publish their work.


NHL Player Clustering and Regression: An Algorithm to Value Players

By: David Novati

The work discussed in this article has been accepted for publication in a future issue of INFORMS –
Interfaces journal and is the original work of Dr. Timothy C.Y. Chan, David Novati, and Justin Cho.

Determining the value of a player is a complicated task for hockey due to the interactions between
players and the pace of the game. Additionally, the limited data available makes evaluating a player
almost an art more than a science. In this method, a clustering algorithm is used on basic NHL statistics
to determine a player type. This is followed by a regression used to value each player type in terms of
ice time and NHL standings points.

Players are classified by different statistics based on position, including:

Forwards: Goals, Assists, +/-, Hits, Blocked Shots, PIMs
Defensemen: Points, +/-, Hits, Blocked Shots, PIMs
Goalies: Wins/Game Started, SO/Game Started, GAA, Sv%

For forwards and defensemen, the data are divided by total ice time with players who did not play much
being removed to avoid 1 goal with 2 minutes of ice time from impacting the analysis. All the data are
also normalized to prevent scale-dominance, meaning statistics like Hits which are in the hundreds will
not skew the algorithm used.

After classifying the players, we get the following types, along with their specialities and example players
from 2009-2010 (as this research was conducted during last season and completed prior to the end of

1. Top Line: Excel in Goals, Assists, and +/- (Ovechkin, Crosby, Spezza)
2. 2nd Line: Do not excel in anything specifically but are the second best offensive type (Vermette,
Foligno, Lupul)
3. Defensive: Excel in Blocked Shots and have higher Hits (Pahlsson, Abdelkader, Fisher)
4. Physical: Defined almost solely by Hits and PIMs (Neil, Konopka, Parros)

1. Offensive: Excel in Points (Keith, Doughty, Karlsson)
2. Defensive: Excel in Blocked Shots and +/- (Zanon, Volchenkov, Gill)
3. Average: Do not excel in anything specficially (Beauchemin, Campoli, Smid)
4. Physical: Defined almost solely by Hits and PIMs (Sarich, Schenn, Carkner)

1. Elite: Superior in all metrics (Vokoun, Thomas, Rinne)
2. Average: Second in all metrics (Backstrom, Lehtonen, Ward)
3. Bottom: Worst in all metrics (Khabibulin, Leclaire, Toskala)

As can be seen, the types are fairly logical and distinct with players you would expect to be in each type.
Some interesting finds here are that players such as Logan Couture and Michael Grabner played enough
to qualify for this analysis and both are considered Top Line forwards in 2009-2010, perhaps it was not
surprising that they did so well in their real rookie seasons.

The next step is to attach a value to each cluster type. To do so, the sum of ice time played by each
cluster type on each team, factoring in trades, is calculated and normalized by the amount of ice time
available to each position to get the “average” line up on the ice for a team over the whole year. For
example in 2009-2010, Washington had 2.13 Top Line Forwards (out of 3) on the ice at any given time,
while Ottawa only had 0.51.

Once the sums have been calculated for each type, a multiple regression is done with each all the
summations vs. team standing points using all the data from 2005-2006 to 2009-2010. The results from
that show that if a player plays 100% of the time during a season they can expect the following points
(with 95% confidence):

1. Top Line: 28.8
2. 2nd Line:  19.9
3. Defensive: 18.7
4. Physical: Not statistically significant

1. Offensive:  9.3
2. Defensive: 3.7
3. Average: Not statistically significant
4. Physical: Not statistically significant

1. Elite: 32.1
2. Average: 21.2
3. Bottom: Not statistically significant

To evaluate a player, simply take their average time on ice * games played/total time on ice in a season
(~4967 minutes with OT) and multiply by the coefficient above. So for Daniel Alfredsson as a Top Line
Forward in 2009-2010, we take 19.65*70/4967*28.8 = 7.98 points. Meaning Alfredsson himself
contributed about 4 wins to the 2009-2010 Senators.

One application of this method is to evaluate trades or trade options. For example, the infamous
Heatley trade comparing the San Jose package with the well-known alternative in Edmonton. Going
beyond simply Michalek and Cheechoo for Penner, Cogliano, and Smid the analysis shall included the
signing of Cammalleri over Kovalev under the assumption that had Healtey’s fate been determined prior
to July 1 (when the Edmonton deal was offered) the extra cap space would have gone to Cammalleri.

Since the deal happened in the off season the analysis uses the player types from 2008-2009 but Time
On Ice and Games Played from 2009-2010 assuming the health and role of the player was
predetermined. Additionally, adjustments were made to the Time On Ice to make the totals constant as
subtracting Heatley’s ~20 minutes a game and replacing them with 15 minutes a game is inaccurate.
After all this is taken into account, the comparison is:

San Jose Package:

Player Cluster Points
Alex Kovalev 2nd Line 5.60
Jonathan Cheechoo 2nd Line 2.92
Milan Michalek Top Line 6.98

Edmonton Package:

Player Cluster Points
Michael Cammalleri  Top Line 8.24
Dustin Penner 2nd Line 6.01
Milan Michalek 2nd Line 5.93

*Note that Ladislav Smid was a Physical Defenseman and therefore not able to be evaluated.

The result of this transaction is a savings of ~$900,000 in cap space and an increase of 1.27 points in the
standings. That might explain why the Edmonton deal was the one Brian Murray considered before the
San Jose deal. This comparison is also favourable to the San Jose package in that it does not account for
the decline of play in Cheechoo and slight drop off from Michalek since his San Jose years.

In more current signings, this method can also be used to compare the signing of Craig Anderson to
replace Brian Elliott. Last year, since Elliott was unable to be evaluated, he essentially contributed 0
points to Ottawa’s success. Simply replacing his 2293 minutes with an Average goalie worth 9.78 points
over that same time period bumps up Ottawa to 11th in the East, one point behind Toronto. If Anderson
were Elite in that time that’s 14.8 points putting Ottawa at 89, 4 out of a playoff spot. However, given
the talent traded out of Ottawa last season and the improvements made by a lot of teams in the East,
the youth will have to play extremely well and Anderson will need to continue his dominant play for
Ottawa to have a chance.

Chan, T., Cho, J., & Novati, D. (2011). Quantifying the contribution of NHL player types to. INFORMS:
Interfaces , 15