Monday, August 6, 2007

On the Statistical Rating of Basketball Players (NBA)

As some of you may remember, I once posted on my old site a rating of the greatest baseball players of all time, by position. As some fewer of you, such as my grandfather, may remember, I never finished posting it. This is because I got bored, because was basically just updating the results from my Bill James book to account for people like Albert Pujols, who were not yet established when that book came out.

Well, this is something new. In a lifetime of trying to make up statistical ratings for baseball and then basketball players, I think this one represents a major breakthrough. The inspiration for these ratings came from a study I did on assists, and from some ideas in baseball player evaluation.

Baseball has a statistic called VORP. This stands for Value Over Replacement Player, which is pretty self-explanatory. The idea is that a player’s value isn’t measured from zero or from average, but from the edge of the major leagues. Basically, how much better is a guy than the next guy the team could bring up (practically) for free from Triple-A? For the categories I am discussing, I have estimated the replacement level empirically from the statistics of end-of-the-bench NBA players.

Anyway, basketball statistics can be measured in this fashion. I considered the standard statistics: points, rebounds, assists, blocks, steals, turnovers, fouls, and shooting efficiency, and then I jumped through a few hoops to determine the proper weights for the relationship between them. Most of it is pretty straightforward, but here are two things that I have handled differently than is usually seen:

First, scoring and assists. Using assists and shooting percentages, I mathematically estimated the effect of a Good Pass on shooting efficiency. A Good Pass is defined to be a pass that will be credited as an assist if the ensuing shot is made. I also estimated the number of Good Passes made and received by each player.

This has two benefits. First, by determining the effect on points per shot, we can calculate exactly how valuable a Good Pass is. Secondly, by determining the number of Good Passes received, we can set a different boundary for replacement scoring efficiency for every individual player. For example, not only can we credit Steve Nash for creating easy shots for his teammates, but we can also credit him for not having the ability to receive passes from himself. In general, this compensates guards for creating their own shots, something many statistical measurements do not.

Second, turnovers. Many statistical systems subtract absolute turnovers, which is also biased against guards. Given the amount of time he spends with the ball in his hands, a point guard with three turnovers in a game has almost certainly protected the ball better than a center with two. My solution is to consider turnovers as a percentage of possessions. Since the vast majority of possessions end in a shot or a turnover, this is easy to calculate. However, the point is to recognize those who handle the ball, not those who take a lot of shots. If a guard drives and kicks the ball to an open shooter, it is really the passer, not the shooter, who avoided a turnover on the play. So the number of player possessions, for these purposes, is: Turnovers + Shots + Good Passes Made – Good Passes Received.

Now, after combining each player’s contributions in all areas, I had a choice to determine what to do with the ensuing raw scores. One method is to normalize for pace, and then rate players on some sort of per-minute basis directly from that point. This is a perfectly good way to go about things.

However, I chose to borrow another baseball idea. Bill James introduced the concept of Win Shares, which directly relate player success to team wins. If the New York Yankees win 90 games, their players are credited with exactly 270 Win Shares. I think the explicit correlation of team success to player value is a good method.

However, my number—which I’m for the moment calling a Value Score (VS), although I am declaring open season for a catchier name—is not directly analogous to a Win Share. James goes through a lot of trouble to show that Win Shares are not biased toward players on good or bad teams; if a player gets traded from the Red Sox to the Royals or vice versa, his Win Shares won’t be biased. In basketball, where there is much more interaction between players, I did not find that to be true. So in my system, a 60 win team will have a higher total Value Score than a 30 win team, but it will not be not twice as large. It will be closer to one and a half.

I did keep the numbers in the general ballpark of Win Shares, so the following intuition that baseball stat geeks may already know will still hold. A 10 VS season indicates a player is a solid starter, a 20 VS season denotes a minor star, a 30 VS player will be an All-Star and tends to be in the MVP discussion, and a 40 VS player is probably the best player in the league.

This system passes three important tests. First, as mentioned above, a player’s rating does not change excessively when he moves to a team of greater or lesser quality. This is consistent with the idea that the number captures a player’s inherent value. Second, the system is fair to players at all positions. Because of the ambiguity that comes from the majority of players who play multiple spots, I like to divide players into four classifications: guard (PG/SG), swingman (SG/SF), forward (SF/PF), and post (PF/C). (Note: The assigned position has no effect on ranking players, but only for my own organizational purposes. You could use whatever designation you like without changing the numbers). In the eight seasons for which I have ratings, 117 players have had at least one season of a 20 or higher VS. Of these, 32 were guards, 31 swingmen, 29 forwards and 25 posts. Third, the system provides results that are generally consistent with observed excellence.

I would also like to contrast this method of player evaluation with two notable statistical measures: John Hollinger’s Player Efficiency Rating, and Wins Produced, created by the authors of the sports economics book The Wages of Wins.

The Wages of Wins’ method is based on regression analysis, and the statistics are weighted to maximize the relationship of the statistics to team wins. This is perfect in baseball offense, wherein players each bat with the intent of maximizing production and minimizing outs, and—this is important—take turns hitting. But in basketball, in which anyone can take any shot, the application of team logic to player logic is not quite as effective. Yes, the team wants to maximize its shooting percentage, but the team doesn’t maximize its shooting percentage by maximizing every player’s shooting percentage. Rather, it accomplishes this by taking its best option each possession. On a possession when the team cannot get an easy shot at all, the best scorers on the team will be disproportionately likely to take that difficult shot.

Consider this analogy. Suppose, in baseball, a team could reset its lineup before the ninth inning of a close game. A team would obviously choose to bat its best hitters against the opposing closer (ace reliever), which would then lower their batting averages. So the team batting average would be maximized, but the best hitters’ individual averages would not be maximized. We might still want to measure the best hitters by batting average, but since there is self-selection, we would also have to adjust for total at-bats.

The Wages of Wins does not make this adjustment. Therefore, while I agree with the authors’ conclusions in general, which is that players who score a lot tend to be overrated, I think their system overcompensates away from scorers. In short, I agree that Allen Iverson was not really the best player in the league in his MVP season, but I do not believe there were 90 better players in the NBA that year.

My differences with John Hollinger’s PER system are more philosophical than technical. His is a per-minute system, thus measuring efficiency. My system examines aggregates. If two players of apparently similar effectiveness receive significant differences in playing time, PER is the way to see their production matches. My system rewards the greater number of minutes, which limits its versatility but does allow it to credit players who play more minutes than their stats would seem to justify, such as defensive stoppers. With PER, you just have to know.

Also, both systems I have mentioned examine absolute turnovers as discussed above, rather than crediting players who always have the ball in their hands.

In my next post, I am going to look at the NBA’s best recent players, according to the VS system.

1 comment:

Anonymous said...

This is great info to know.