Late November in North Carolina is a very special time of year, as it marks the beginning of a period of deep religious reflection. That’s right, it’s college basketball season. For those of you who are fans and/or stats junkies, I wanted to point you towards a great website – StatSheet.com. StatSheet is the product of (the amazing) Robbie Allen, a friend and fellow collaborator on BarCampRDU and other RDU-area projects.
What is StatSheet? It’s the FiveThirtyEight.com of College Basketball (as well as the NFL, NBA, and High School Basketball). The site boasts a terrific, clean interface, with a focus on stats and graphs – for wonks, Bill James-heads, and fantasy fans. I particularly like the embeddable graphs – check out the GameFlow graph from last night’s UNC-Oregon game:
Robbie’s blogged about his statistical forumla for calling a game over at the StatSheet Changelog. And for the die-hards, StatSheet also list information about referees. As noted by Robbie in the disclaimer, “Boxscores list three officials per game. I have no way of associating specific foul calls to specific refs. As a result, I associate the number of fouls called in a game with each ref.” This presents an interesting statistical problem – could we devise a technique to break down this collective data and provide an estimate prediction of fouls/game?
Since the data is coupled, we could employ analysis of variance to analyze the groups, looking for referrees that significantly vary. For example, if we have nine referees who rotate through three pairs, we would be able to use analysis of variance to target and identify a referee that consistently delivers more or less fouls than the standard interval (i.e. look for the common outlier). But what if we wanted a predictive model? In that case, we might wish to apply a fixed-effects or hierarchical linear model. Looking at the pseudo-interactions between the groups of referees, we would be able to predict an estimate of fouls/game for the combination. This would be most interesting to explore from a historical perspective, to identify games with significantly more or less fouls. Potential interactors in that model would include TV broadcast, team rankings, and Duke status (if the team is Duke, the number of fouls called on Duke is generally two standard deviations below the ref’s mean).
My disclaimer is that I don’t know the first thing about sports stats, so pay no attention to me. However, I’m loving StatSheet, it has become my go-to stats site (edging out both ESPN and Yahoo Sports), and I thought I’d pass it on to you.







