David Sparks


David Sparks is the Arbitrarian. His groundbreaking statistics column runs ran weekly here at HP. But now he’s moving on to working for an NBA team, while I’m stuck making Crazy Pills jokes with the rest of the animals. Sigh. Bastard. In all seriousness, David has been a joy to work with, and a great member of HP. We hope you’ve enjoyed his work as much as we have, and we wish him nothing but the best of luck in his future endeavors. He’s got a bright future in a league that discovers more and more every day the value of metrics. This is his farewell post. Do us a solid, and let David know how much you liked his stuff in the comments. Cheers, David. Once we’ve stopped crying over David and playing Peter Gabriel, we’ll work on getting a new stats columnist. Keep your eyes open.

At least for the time being, the Arbitrarian is going on hiatus. I have been honored with an offer to intern for a very successful professional basketball franchise, and so my statistical work will be for that team, no longer pro bono publico.

It has been an honor and a privilege to work with this group of excellent writers at HP, and it has been endlessly entertaining to engage with you readers in the process. I am sure that I have learned more than I have taught in the interaction–thank you for your patience and willingness to share your insight–I appreciated every comment, response, and e-mail I have received.

Special thanks goes to my co-bloggers here, who have been nothing but supportive; the fine thinkers at APBRmetrics (to whom I owe much); and most of all my wife, who has encouraged me in all my endeavors, even this one.

I hope you will join me in continuing to follow Hardwood Paroxysm and its peerless coverage of our favorite league, and I truly hope that you think about the world with a different, more Arbitrarian, mindset.


Vote It Down...Vote It Up! Rate this post!
Share: Digg this Add to Technoratie Favorites BallHype: hype it up!

David Sparks is the Arbitrarian. His stats column runs weekly here on Hardwood Paroxysm.

Which teams were the most interesting last year, and which might be the most interesting this coming year? Obviously, there are innumerable ways to conceptualize “interestingness” in basketball–amount of interpersonal drama, perhaps, or exciting style of play–but today I’m going to present a series of different takes on interestingness, and apply these measures to the NBA. There’s no way we can cover all possible understandings of what makes a basketball team interesting, but hopefully I can offer several reasonable-sounding operationalizations…

Predicting game outcomes

First, since several of the following definitions of interestingness rely on it, I want to define a very simplistic and abstracted way of predicting game outcomes. First, imagine that you have two coins, each of which has a 50% chance of landing on heads, and a 50% chance of landing on tails. What are the odds of the various possible outcomes from flipping both simultaneously?

This is pretty straightforward, where we have two coins, A and B, and each can land Heads up or Tails up:

p(A=H&B=H) = 0.5*0.5 = 0.25p(A=H&B=T) = 0.5*0.5 = 0.25p(A=T&B=H) = 0.5*0.5 = 0.25p(A=T&B=T) = 0.5*0.5 = 0.25

So, each H/T combination of two-coin outcomes has an equal chance of occurring, 25%.

What if A has a 60% chance of landing on Heads, and B has a 40% chance? Probabilities are somewhat different:

p(H,H) = 0.6*0.4 = 0.24p(H,T) = 0.6*0.6 = 0.36p(T,H) = 0.4*0.4 = 0.16p(T,T) = 0.4*0.6 = 0.24

Here the probabilities are not all the same. Now, imagine that we say that a coin landing on Heads is the winner and a coin landing on Tails is the loser, and if they both land with the same face up, the two coins tie. Using the 60/40 probabilities above, we know that there is a 36% chance that A wins, a 16% chance that B wins, and a 48% chance that they tie. (Make sure this makes sense to you.)

Now, imagine that we disregard the ties (i.e. if they tie, we make them re-flip until there is no tie). What is the probability that A wins? Just take the probability from above (36%) and divide it by the universe of allowed outcomes (p(A wins) + p(B wins) = (36% + 16%) = 52%). You find 0.36/0.52 = 0.6923077. In other words, if you pair a 60% winning coin against a 40% winning coin, there is about a 69% chance that A lands on heads and B lands on tails, if you re-flip any ties.

How does this apply to basketball? Well, take Denver and Chicago from the 07-08 season. Denver’s winning percentage was about 60% and Chicago’s was about 40%. Assume that their winning percent is analogous to the weights assigned the coins above, that is, it’s the odds that they’ll win on any given night. Just as above, if we know that on one particular night one of these two teams won and the other lost, the chance that it was Denver that did the winning is about 69%, and the odds that it was Chicago who won are about 31%.

Now let’s pretend that a head-to-head game is just like the coin tossing face-off above–that is, it doesn’t matter how well the two teams match up, or who’s injured, or who’s on a hot streak, or who’s defense will smother the other team’s best scorer, or what have you. Imagine if, on game night, the owners of the two teams met at center court and flipped coins weighted according to their season-long winning percentage, and re-flipped ties… that’s how I would predict the chances of each team winning.

Now, in the Denver v. Chicago case, both teams have a chance of winning (69% and 31%, respectively), but Denver is more likely to win, because their odds are greater than 50%. So, if you had to predict a single game, you’d go with Denver. However, if the teams met 1,000 times, you would expect Denver to win roughly 690 times, and Chicago to win about 310 times.

And that’s how I’ll be constructing win probabilities for the remainder of the post. As long as neither team is undefeated or has literally no wins, there is always at least a slim chance that the underdog team can win. Applying this algorithm to a Boston v. Miami game in the 07-08 season, you’d get a 94.852% chance of Boston winning, meaning that it’s highly unlikely, but not impossible for Miami to have won that matchup.

Interestingness as unpredictability

If we know each team’s winning percentage, we can come up with predictions of a winner for each game in a season. If, for each game, we call the team with a greater than 50% chance of winning (this is, essentially, the team with the better win%) the predicted winner, when comparing these predictions to actual 2007-08 regular season outcomes, we find that we predict correctly 69.7% of the time, which is, at least, better than half, and probably better than many of ESPN’s experts.¹

One possible definition of interestingness is unpredictability–that is, if we know the outcome of the game before hand, the game itself is likely to be less interesting (this is why people who miss the live broadcast, but plan to watch it later on TiVo, don’t want to be told the score). So, which teams were the most and least predictable?

UTA2008 0.610SAC2008 0.622ATL2008 0.634DAL2008 0.634IND2008 0.634NJN2008 0.634PHI2008 0.646WAS2008 0.659CLE2008 0.671POR2008 0.671CHA2008 0.683CHI2008 0.683HOU2008 0.683DEN2008 0.695GSW2008 0.695MIL2008 0.695NYK2008 0.695ORL2008 0.695MIN2008 0.707NOH2008 0.707LAL2008 0.720DET2008 0.732LAC2008 0.732SAS2008 0.732TOR2008 0.732PHO2008 0.744SEA2008 0.756MEM2008 0.780BOS2008 0.805MIA2008 0.817

Note that Miami, having the worst record in the league, would have been predicted to win none of its games, while Boston was predicted to win all of its games, because it always had the better record in its matchups. Miami game outcomes were correctly predicted 81.7% of the time, which is (1-win%), and Boston outcomes were correctly predicted 80.5% of the time, which was their winning percentage. In general, it’s easier to predict teams with more extreme winning percentages, because they are more (or less) likely to face teams with worse records. Close-to-.500 teams are the hardest to predict correctly (in general). Thus, it is instructive to contrast predictability with record, as I do in the graph below:

The huge outlier is Toronto, oddly enough. Despite their exactly 0.500 record, Toronto’s game outcomes were correctly predictable almost 3/4 of the time–surprisingly high. The biggest outlier at the other end of the spectrum is Utah, who despite having one of the best records in the league (which should lead to ease of prediction), were the least easy to correctly pick.

Interestingness as upsets

Unpredictability just means that a team lost games it “should have” won, and won games it shouldn’t have. However, for a fan, losing games that should be won adds more to angst than interest. Which teams did the best relative to their opposition–making their fans happy by winning games they were predicted to win, and upsetting opponents who should have beaten them?

To determine this, for each game played by each team, I estimate the team’s probability of winning using the above methodology. Then, depending on the actual outcome, I assign a binary 0/1 value for that game if they lost/won. To compute an “upset factor,” I subtract predicted probability of winning from the binary lost/won variable.

Thus, if a team has a 72% chance of winning (it is substantially better than its opponent), and wins, the upset factor for that game is (1-0.72) = 0.28. Had they lost, the upset factor would be (0 – 0.72) = -0.72. A team with very little chance of defeating it’s opponent, say 6% (like Miami’s odds against Boston), would get 0.94 if they won, but just -0.06 if they lost. Thus teams are rewarded for winning (and punished for losing), but proportionately to their projected odds of winning.

Over the course of the season, the best teams will beat most of their opponents, and so should generally have positive cumulative upset factor sums. The worst teams will lose more often, and so should generally have negative cumulative upset factors. However, some teams will defy their probabilities, and outperform (or underperform) expectations, and thus a bad team which manages to be an occasional “Giant Killer” may have a season-sum upset index that defies its record. How does this look for 07-08?

IND2008 -1.481NYK2008 -1.389MIL2008 -1.351ATL2008 -1.343MIA2008 -1.207CHA2008 -1.116NJN2008 -1.030CHI2008 -0.897MEM2008 -0.786MIN2008 -0.734TOR2008 -0.610SEA2008 -0.554WAS2008 -0.495PHI2008 -0.429LAC2008 -0.379ORL2008 -0.213BOS2008 -0.005DET2008  0.000CLE2008  0.035SAC2008  0.410POR2008  0.902DEN2008  0.903GSW2008  0.964UTA2008  1.293DAL2008  1.375PHO2008  1.464HOU2008  1.551NOH2008  1.615LAL2008  1.654SAS2008  1.851

As you can see, the “most upsetting” teams are some of the league’s best, which in this context means that they beat teams they were expected to, and did not lose much to teams they should have beaten. In this respect, the Celtics are at a disadvantage, since based on their record, they should not have lost to anyone, and so every loss counts heavily against them.

One possible interpretation (and I stress “possible”) of these numbers is that the Spurs actually played 1.851 wins better than their record of 56 wins would indicate, given their opposition. The Celtics’ and Pistons’ actual records fairly accurately capture their ability given their opposition, and the Pacers’ 36 and 46 record is actually about a game-and-a-half too good, given how they lost to teams they should have defeated, and failed to upset many better teams.

Interestingness as potential

One final means of defining interest as we head into the 08-09 season, is potential. Every player, at any given time, has a certain level of productivity, and this level of productivity varies in generally predictable ways: usually it takes several years in the league to climb to peak productivity, which is maintained for several more years, before a decline sets in. Typically, players are their most productive in the middle of their careers–rarely do they peak in their rookie year, and even more rarely do they leave the league at the top of their game.

It is possible, then, to think of a player’s potential as their current productivity, given their age or experience in the league. Extremely valuable players, if they are very young, have more “potential” than extremely valuable players in their late-20s. This is part of the reason there is always so much excitement about rookies and rising stars–any amount of success they find early on is likely only to increase as they come into their prime.

At the other end of the spectrum, players in the middle of their careers, who have still not managed to become highly valuable, have very little potential. Of course, all of this varies. Tim Duncan, even at this relatively late age, is still likely to be valuable in the near future, even if he doesn’t necessarily improve. However, a General Manager might be more inclined to sign Chris Paul to a long-term contract than Jason Kidd, even if the two had been equally productive last year–Paul just has more potential, given the success he has found, and given his age.

Thus, we can estimate, for every player, some index of potential, essentially by dividing value (measured in MVP) by age. (Technically, I divide MVP/age at the per-game level, and multiply by the minutes-weighted mean age in the league (nearly 27), and then multiply this by 82, to estimate the trend of that player’s value.) When applied to the 07-08 season, we find the following estimates of potential:


(I’ve also thrown in the top-500 best-potential seasons from my dataset, which only includes 1986-2008, and so misses out on some really excellent rookie seasons. Apparently LeBron has lots of potential.)

Now, incorporating all the offseason moves, and using a magical formula that lets me convert MVP to team wins (Pythagorean 5.25), here are my projections (based only on this estimate of potential) for team success (in wins) at some future time:

MEM 13.8SAC 21.7NJN 23.1DEN 23.6PHO 27.7GSW 28.0MIL 28.6OKC 28.9CHI 31.6MIA 33.3TOR 33.5POR 35.1SAS 35.9NYK 38.0LAC 38.2ORL 38.4DET 38.8CHA 46.7BOS 48.0ATL 48.7MIN 50.7WAS 52.2CLE 53.8PHI 56.0IND 56.6UTA 57.1NOH 58.5DAL 58.5HOU 61.8LAL 63.1

Notice all the hedging I did–it’s unclear whether these estimates should apply to next season, or several seasons down the road. I doubt, for example, that Phoenix and San Antonio will fall so far in 08-09, but you could imagine that, playing with these same rosters four or five years from now, the then-senior citizens on those teams would not fare so well. Also note that this doesn’t include anyone with no NBA stats–meaning that I haven’t incorporated the doubtless boon brought by Oden, Rose, Beasley, et al. That said, I can see the Lakers, Hornets, Rockets, Jazz, and 76ers being very interesting in the near future, and so this may not be all crazy.

Conclusion

I’d be very interested to hear if you like these conceptualizations and measures of interestingness, and especially if you think the measure of Potential has any merit at all. How would you measure interesting, if you had to use statistics? Does your impression of teams on the rise and teams on the decline mesh with the team success projections listed above? Let me know in the comments.

¹ Keep in mind that this prediction methodology is extremely simplified. It doesn’t take home court advantage into account, nor any interaction effects between the two teams. Obviously, adding in both of these would make the model more accurate, but if I had the time and ability to predict outcomes perfectly, I wouldn’t be sharing that knowledge with you, I would be gambling. So, please accept this approximation for the abstraction that it is.


Vote It Down...Vote It Up! Rate this post!
Share: Digg this Add to Technoratie Favorites BallHype: hype it up!

David Sparks is the Arbitrarian. His stats column runs weekly here at HP. This week he discusses depth and its impact.

The survey responses to last week’s post were so interesting, I decided to do an immediate follow-up (if you haven’t read it, you may want to do so before continuing here). Last week, we focused on team rotation size, as measured by minutes played. Today, we will look at a very similar, but somewhat more interesting concept: team depth.

Depth and rotation are not necessarily the same. Since there must be five players on the court per team at all times, the theoretical minimum for rotation size is five, which you would see if a team played only five players, all game, every game. However, depth concerns not playing time, but production, and it is easy to imagine one of those five players contributing more than 20% of the team’s total production, while one or more of the others produces less than their share. (There is a metric, called the Valuable Contributions Ratio, which I use to measure players’ productive contributions relative to their floor time.)

If each player produced in proportion to their allocation of minutes, it would make no difference which players were on the floor, but obviously this is not the case. Rather, better players produce a greater proportion of their team’s production than their proportion of a team’s minutes played. This implies, of course, that a team’s rotation size will likely not be the same as its productive depth, and further, that depth will likely be smaller than rotation.

In fact, depth can be calculated in exactly the same way as rotation (see last week’s column), except instead of using minutes as the variable of interest, we use Model-Estimated Value (MEV), a productivity metric.

So many theories

Last week, I invited readers to speculate about the relationship between rotation size and team success. You submitted countless interesting ideas in response to this question, and made many other interesting suggestions about ways to assess rotation consistency, variations in rotation size by coach, and differences between regular-season and playoff play, among others. I hope, in time, to investigate some of these great ideas.

For now, let us turn to the relationship between rotation size and success. In response to my question, the plurality of respondents said that wins and rotation size would positively correlate, many noting that deeper rotations would probably enhance a team’s chances in the playoffs.

Others suggested that the relationship would be negative, due to the fact that poorer teams needed to give more playing time to younger, weaker players, to aid in their development.

A large minority of answers indicated that there should be no consistent relationship. Several of these claimed that rotation size was too idiosyncratic: a function of the coach, playing style, and available personnel, and successful teams could make any sort of rotation a winner.

Several others predicted a parabolic relationship, in which the smallest rotations would find success on the back of a few stars, the largest rotations succeed through roster flexibility, and those in the middle, by failing to follow either strategy, will not do well.

I must admit that I was intrigued by all of these arguments, especially the parabolic prediction. My personal hypothesis was that increased rotation size would lead to greater success, due to the positive effects of diversification, as in the stock market. With more diverse contributions, I thought, would come greater insurance that even if one player failed to show up, one or more of his teammates would pick up the slack and ensure victory.

There were a number of other interesting hypotheses: one was that since defense requires a greater exertion of energy and offense requires time to find a rhythm, defense would correlate positively, and offense negatively, with rotation size. Other noted that faster-paced teams may require longer rotations, due to greater energy expended per minute. Several others suggested that the age of the team would vary positively with rotation size, as younger players can typically play a greater number of minutes without hurting productivity.

The empirical evidence

Who was most correct? Well, first I should mention that part of the problem with my question last week was that rotation size was often conflated with depth, which I define as separate concepts. That said, after reviewing the graphical relationships, I must sadly rule out the parabolic hypothesis. The rest of the relationships (between all suggested variables), are depicted in the correlation matrix below:

     rotation  depth gameage   poss offeff defeff effdifrotation    1.000  0.412   0.016 -0.069 -0.057 -0.083  0.020depth       0.412  1.000  -0.007  0.079  0.375 -0.041  0.321gameage     0.016 -0.007   1.000 -0.085  0.069 -0.143  0.164poss       -0.069  0.079  -0.085  1.000  0.016  0.016  0.000offeff     -0.057  0.375   0.069  0.016  1.000  0.160  0.648defeff     -0.083 -0.041  -0.143  0.016  0.160  1.000 -0.648effdif      0.020  0.321   0.164  0.000  0.648 -0.648  1.000


Rotation and depth are measured as described previously. Game age is the playing-time-weighted age of the team. Possessions are a measure of pace. Offensive efficiency is a measure of a team’s scoring per possession, while defensive efficiency measures the same thing for their opponents (so better defensive teams have a lower defensive efficiency as constructed here). Efficiency difference is a measure of absolute quality, subtracting defensive from offensive efficiency.

Many of these results (the ones close to zero) indicate no relationship: Rotation size seems to be unrelated to anything but depth. However, depth appears to be positively correlated with offensive efficiency, and thereby, also positively correlated with efficiency differential–apparently teams with greater depth (at the per-game level) see improved efficiency differentials. One problem is that we cannot tell which direction causality moves in. Do deeper teams play better, or do teams who are winning by a lot give bench players increased minutes and thus increased time to produce?

To some extent, the likelihood of the second option can be tempered by the fact that rotation size has no real relationship with efficiency differential, but this question is still not definitively settled.

Expanding our scope

How have rotation sizes and depth changed over time? Which teams, historically, are the deepest? Due to data limitations, to investigate these questions, I must change the way I measure rotations and depth. Instead of assessing these at the per-game level, to make historical comparisons, I will measure at the season level, meaning that from this point on, rotation is best understood as the inverse of the concentration of minutes played over the course of the season, and depth is best understood as the inverse of the concentration of production over the course of the season. In general, these figures will be higher than each team’s mean per-game figures, due to changes in the roster and substitution patterns over the course of a season. However, error ought to be normally distributed, and so I will press forward using these slightly modified metrics, which are interesting enough in their own right.

As you can see in the plot above, both rotations and depth have increased over time. Rotation is denoted in red, and depth in cyan, and both are greater now than they were in the early years of the NBA. There could be any number of reasons for this–expansion, and the dilution of the talent pool, could be responsible; or merely a realization that heavy minutes’ loads may shorten player’s careers. Incidentally, I have scaled the size of each team-year marker to their winning percentage, but the relationship between depth, rotation, and winning is unclear in this depiction.

Below, I plot team winning percentage (jittered) against team depth. The color scale indicates rotation size, going from small (red) to large (blue), so that if you see a blue team amongst several red ones, you know that that team has a relatively large rotation given its depth. I’ve also scaled markers by year, so that more recent teams stand out more.


Fullscreen Version

The first thing I notice is the outliers. The most concentrated teams appear to be several Chamberlain squads, in which he was an absolutely dominant producer, and carried his team more than any other player ever has on a consistent basis.

The least concentrated teams are several more recent, and fairly bad teams, topped by the 2002 Chicago Bulls, who were very deep with potential that had yet to develop into actuality.

As noted above, depth has increased over time, and so it is interesting to note the most concentrated teams in a more modern era (which I mark with the inception of the three-pointer, 1979-present). There are two very shallow Utah teams, lead by Malone and Stockton, and supported by almost no one else. The pre-Pippen Bulls show up here, as do the Kobe-only Lakers–teams with one star who did a substantial amount of the producing. We also see the ’87 Celtics, ’04 Timberwolves, and ’08 Hornets, each of which had a couple of extremely good players dominating the contributions to winning, and then filled the rest of the roster out with players who couldn’t hope to match the same level of productivity.

Among the very best teams, there is a decent variety of concentration, although it is interesting to see the ’08 Celtics at the high end of depth among this elite. Their big three may have gotten the headlines, but it the entire roster made important contributions. Further down and to the right, we see the ’08 Rockets, which put on the least likely 22-game winning streak in history, on the back of role players, a different one of which stepped up every night. This team was very successful, given its depth, and it will be interesting to see how this translates to future success.

What does it mean?

The overall trend is a slight but definite negative relationship between team depth and success, but it is unclear what conclusions can be drawn from this. Is this proof that a superstar (or a Big Two, or a Big Three) is key? Does it reflect the fact that it’s easier to field a team of equally poor players than a team of equally excellent players?

Since this graphic is based on season-level data, it may just mean that teams with less volatility in their rotation and minimal personnel turnover are more successful. However, I must admit to being unsure of what to make of these preliminary findings. Should teams dump their midlevel players (in salary and productivity terms), in pursuit of a bimodal roster of two stars and ten inexpensive warm bodies? Obviously, constructing a roster requires more than just collecting players at varying levels of talent–the interaction of their abilities is a key consideration–a team is more than the sum of its parts. I would love to hear your insight, explanations, and questions in the comments. Also, I would appreciate your taking the time to fill out the short survey below.


Vote It Down...Vote It Up! Rate this post!
Share: Digg this Add to Technoratie Favorites BallHype: hype it up!


Ticket Network

Choose from our tickets online: Celtics tickets, Lakers tickets, and Pistons tickets. Plus, we have a packed NFL schedule and lots of great baseball tickets.
  • Lowry can catch-and-shoot off the run better than he can straight spot-up. His body's better in motion. 2 days ago
  • drains a runner, then shows up Hollins and the Memphis coaching staff. Wowzers. Moxy. 2 days ago
  • Huh, never caught that before. 86-74 Rockets over Griz with time running out in the 3rd, Lowry drives right into Conley 2 days ago
  • 3. Defense. Lowry combines his quickness with opportunistic approach and awareness. Conley is inconsistent, but there's some good and bad 2 days ago
  • 2. Conley can shoot, Lowry can't. conley's spot-up work would make him a great back-up 2-guard with a combo-1. 2 days ago
  • 1. Conley can't dribble, Lowry can. Lowry is quite able and willing to dribble down into a double, then back out of it. Conley scoots around 2 days ago
  • So if we're looking at alternate universes, Conley vs. Lowry, we're really weighing three components. 2 days ago
  • Biggest thing Lowry needs to work on? Spot-up shooting. 34% shooter in that situation, 30% from the arc. Lot of good looks, too. 2 days ago
  • Actually thought to myself, man, the Grizzlies could really use Kyle Lowry. OHWAIT http://bit.ly/caXezz (I foolishly supported the move) 2 days ago
  • Kyle Lowry drew fouls on 14% of his ISO posessions. 38% ISO shooter with a 44% scoring rate. 2 days ago
  • More updates...

Support Our Sponsors:

OnlineSeats has the best basketball tickets all season long to every NBA game. Find Lakers tickets, Celtics tickets, Knicks tickets, Pistons tickets, and more at the most affordable prices.

Browse by Category

15 Footer 2010 NBA Playoffs Announcements Audio Paroxysm Backboard's Shadow blockclocked Breaking News Commentary d-league Diagnosing Madness free agency 2010 Great Exercises in Internet NBA-Related Postings Lion Face/Lemon Face NBA Draft NBA HD Nova podcast paroxysm Previews Summer League Tumblr Uncategorized Videos