## The Distribution of Talent Between Teams

October 20, 2010Posted by Sobchak in Comparing Sports,R,Simulation,Talent Distribution

Four years ago Tango had a very interesting post on how talent is distributed between teams in different sports leagues. I want to revisit and expand upon some of the points that came up in that discussion.

First, lets look at some empirical data. I scraped end of season records from the last ten years for the NFL, NBA and MLB from ShrpSports (I decided to omit the NHL from this analysis due to the prevalence of ties). The data is available here (click through) as a tab-delimited text file. I used R to analyze the data. If you don’t have R you can download it for free (if you use Windows I recommend using it in conjunction with Tinn-R, which is great for editing and interactively running R scripts). Here is the R code I used:

^{?}View Code RSPLUS

records = read.delim(file = "records.txt") lgs = data.frame(league=c("NFL","NBA","MLB"),teams=c(32,30,30),games=c(16,82,162)) lgs$var.obs[lgs$league == "NFL"] = var(records$win_pct[records$league == "NFL"]) lgs$var.obs[lgs$league == "NBA"] = var(records$win_pct[records$league == "NBA"]) lgs$var.obs[lgs$league == "MLB"] = var(records$win_pct[records$league == "MLB"]) lgs$var.rand.est = .5*(1-.5)/lgs$games lgs$var.true.est = lgs$var.obs - lgs$var.rand.est lgs$regress.halfway.games = lgs$games*lgs$var.rand.est/lgs$var.true.est lgs$regress.halfway.pct.season = lgs$regress.halfway.games/lgs$games lgs$noll.scully = sqrt(lgs$var.obs)/sqrt(lgs$var.rand.est) lgs$better.team.better.record.pct = 0.5 + atan(sqrt(lgs$var.obs - lgs$var.rand.est)/sqrt(lgs$var.rand.est))/pi lgs |

Here is the resulting table:

var.rand. var.true. league teams games var.obs est est ------ ----- ----- ------- --------- --------- NFL 32 16 0.0375 0.0156 0.0219 NBA 30 82 0.0221 0.0030 0.0190 MLB 30 162 0.0053 0.0015 0.0037 regress. regress. better.team. halfway. halfway. noll. better.record. league games pct.season scully pct ------ -------- ---------- ------ -------------- NFL 11 71% 1.55 78% NBA 13 16% 2.69 88% MLB 67 41% 1.85 82%

“Var.obs” is the variance of the observed team winning percentages in each league. The spread in team records will be greater than the spread in the true talents of the teams because of binomial randomness (this comes from the winner-take-all nature of individual games where no matter how closely matched the teams are, one team will get a 1 in the win column and the other will get a 0). This relationship is expressed in this formula:

If we assume team talents and binomial randomness are uncorrelated, this simplifies to:

We can estimate the variance from binomial randomness that would be expected from a season of a specified number of games using the formula . We can then subtract this calculated estimate “var.rand.est” from the empirically derived “var.obs” to get “var.true.est.”

These “var.true.est” numbers are one measure that can be used to evaluate the competitive balance/parity/talent distribution in the various sports leagues. To put these in easier to understand units, we can convert them to standard deviations by taking the square root. This gives us .148 for football, .138 for basketball, and .061 for baseball. Assuming a normal distribution of talents, 95% of football teams (around 30 of the 32 in a given season) would be between .204 and .796 in true talent (again, there could be a wider range in observed team records due to the impact of binomial randomness). For basketball, 95% would be between .224 and .776, and for baseball, 95% would be between .378 and .622. These numbers suggest that for whatever reasons (game structure, team structure, league structure, etc.), talent is much more evenly distributed between teams in baseball than in football and basketball.

In Tango’s post he came up with some measures of his own. His basic idea was that the length of a season should be tied to distribution of talent between teams. If teams are closely packed in talent, the season should be longer to minimize the impact of randomness and to allow the better teams to stand out from the pack. Conversely, if there is a wide spread of talent between teams, a shorter season should suffice to separate the contenders from the pretenders, and adding more games would be unnecessary. To quantify this, he came up with what I have called “regress.halfway.games,” which is the number of games a season would need to be for var(true) to equal var(rand) (the formula for calculating this is ). At that season length half of the variance in observed winning percentages would come from the distribution of talent and half from randomness. Another way to think of this figure is that it represents the point in a season where to estimate a team’s talent one would regress the team’s record halfway toward the mean (hence “regress.halfway.games”).

Tango wasn’t trying to say that this “regress.halfway.games” number was the ideal season length given a specific distribution of talent (maybe that should be when records are 75/25 talent/luck, or 90/10, or whatever). Instead he wanted to use this measure to make a relative comparison of different leagues. To facilitate this I calculated a measure that I’ve called “regress.halfway.pct.season.” This is just the “regress.halfway.games” for a league divided by the actual number of games in the league. The idea is that this figure should be similar in all leagues. Leagues with wider distributions of talent should have longer seasons, and leagues with narrower distributions should have shorter ones, but for each kind of league the percent of the way into the season one has to go for talent variance to equal random variance should be around equal. But as we see, this is not the case: at only 16% of the way into the NBA season, one has learned a good deal about the talents of the teams – one would have to wait until 71% of the way into the NFL season to learn as much.

The next measure in the table is the Noll-Scully ratio (named after Roger Noll and Gerald Scully). This is a measure used by sports economists to estimate competitive balance. It looks at the ratio of the observed variance in a league to the variance from binomial randomness (though it is typically expressed in terms of standard deviations rather than variances). In a perfectly balanced league (zero spread in talent between teams), the Noll-Scully ratio will equal 1. The ratio will be larger as the distribution of talent increases (increasing the numerator), or the length of the season increases (decreasing the denominator). So this is not purely a measure of talent distribution (for that one can simply look at var(true)). Instead, like Tango’s measure, it combines talent distribution and season length into one number (because of this I cannot fully agree with Guy’s critique of Noll-Scully found here and here – though I do agree that some economists (and Wages of Wins authors) using Noll-Scully have a stunning lack of understanding of the statistics behind it). In fact, one can derive “regress.halfway.pct.season” from Noll-Scully, and vice-versa:

` `

` `

` `

` `

So while these measures are on very different scales, they align in that if two leagues have identical Noll-Scully ratios, they will have the same value for “regress.halfway.pct.season,” and vice-versa. Another measure aligning with these is what I have called “better.team.better.record.pct,” which I got from dcj. Using var(obs) and var(rand), it answers the following question: if at the end of the season you were to select two teams at random from a league, what percent of the time would the more talented team have finished with the better record? It can also be cast in terms of the Noll-Scully ratio:

**A Generic Season Simulator in R**

So all these formulas are nice, but are they accurate? Does subtracting var(rand) from var(obs) really give a good estimate of var(true)? Does the more talented of two randomly selected teams really finish with the better record the percent of the time suggested by the formula? To put these to the test we can construct a generic season simulator, in which we create a model league where we can set the true talents of the teams, the number of teams, and the length of the season. Then we can simulate any number of seasons in which the teams in the league play against each other, using the log5 formula to determine the winner of each game (this is done probabilistically, by comparing the matchup probability to a randomly generated number, which makes this a type of Monte Carlo simulation).

I built such a simulator in R, using the code below. I have tried to include a lot of comments. This is the first simulation I’ve written in R, so there probably is some more efficient way to do it. Note that I have not attempted to simulate home vs. away games, but I have tried to create as balanced a schedule as possible.

^{?}View Code RSPLUS

# create simulator function with arguments and their default values # seasons = number of seasons to simulate # teams = number of teams in league # games = number of games each team plays in season # var.true.pop = variance of the true talents of the population of teams # var.true.logit.pop = variance of the logits of the true talents of the population of teams # use.logit: if set to true, var.true.logit.pop will be used rather than var.true.pop, # and team talents will be sampled from the logit-normal distribution rather than the (bounded) normal distribution seasonsim = function(seasons=1000, teams=30, games=162, var.true.pop=.0038, var.true.logit.pop=.062, use.logit=FALSE) { true = rep(0.0,seasons*teams) obs = rep(0.0,seasons*teams) betterteamwon = 0 betterteamlost = 0 best.team.best.record = 0 best.team.not.best.record = 0 var.rand = 0 # estimate binomial randomness assuming all teams .500, p*(1-p)/n var.rand.est = .5*(1-.5)/games # find the minimum number of times all teams play each of the other teams playeachteam = floor(games/(teams - 1)) # simulate seasons for (x in 1:seasons) { cat("Simulating season", x, "of", seasons, "\r") if (x == seasons) { cat("\n") } flush.console() team.table = matrix(0.0,teams,6) if (use.logit == FALSE) { # generate team talents, average = .500, normally distributed with user-set variance team.table[,1] = rnorm(teams, mean = 0.5, sd = sqrt(var.true.pop)) # take a new sample if any teams are below zero or greater than one, making the distribution bounded while ( (max(team.table[,1]) >= 1) | (min(team.table[,1]) < 0) ) { team.table[,1] = rnorm(teams, mean = 0.5, sd = sqrt(var.true.pop)) } } else { # if use.logit is set to true, this samples from the normal distribution, average = 0 # the sampled values are transformed into team talents by taking the inverse-logit (e^x/(e^x + 1)) # the resulting talents will have a mean of .5 and will follow the logit-normal distribution team.table[,1] = rnorm(teams, mean = 0.0, sd = sqrt(var.true.logit.pop)) team.table[,1] = exp(team.table[,1])/(exp(team.table[,1]) + 1) } # convert probabilities to odds, average = 1 team.table[,2] = team.table[,1]/(1 - team.table[,1]) # schedule, balanced portion # first we create as balanced a schedule as possible # all teams play each of the other teams 'playeachteam' times if (playeachteam > 0) { for (i in 1:(teams-1)) { for (j in (i+1):teams) { # the basic log5 formula using team odds gives the probability one team will win log5 = team.table[i,2]/(team.table[i,2] + team.table[j,2]) for (k in 1:playeachteam) { # compare the log5 result to a random number from 0 to 1 to determine winner if (runif(1, 0.0, 1.0) < log5) { # add a win to one team's total and a loss to the other team's total team.table[i,3] = team.table[i,3] + 1 team.table[j,4] = team.table[j,4] + 1 # record whether the more talented team won if (team.table[i,1] > team.table[j,1]) { betterteamwon = betterteamwon + 1; } else { betterteamlost = betterteamlost + 1; } } else { team.table[i,4] = team.table[i,4] + 1 team.table[j,3] = team.table[j,3] + 1 if (team.table[j,1] > team.table[i,1]) { betterteamwon = betterteamwon + 1; } else { betterteamlost = betterteamlost + 1; } } # calculate var.rand based on actual log5 matchup probabilities # I'm not sure if this method is correct but it seemed better than just assuming all matchups were 50/50 # formula is p*(1-p)/n, where n is the number of games each team plays in a season # since we are summing each matchup up incrementally to let p vary, add (p*(1-p)/n)/matchups each time # matchups are the total number of simulated games, which equals seasons*teams*games/2 var.rand = var.rand + (log5*(1-log5)/games)/(seasons*teams*games/2) } } } } # schedule, unbalanced portion # there usually will be more games that need to be simulated # this randomly matches up two teams at a time # it adds a single game for all teams before starting again and adding a second for all teams, etc. if ((games - playeachteam*(teams - 1)) > 0) { # create a pool of teams with games remaining to be simulated remaining = 1:teams while (length(remaining) > 1) { # create a sub-pool of teams with an equal number of games remaining to be simulated equalgames = 1:teams while (length(equalgames) > 1) { # randomly match up two teams opponents = sample(equalgames, size = 2) log5 = team.table[opponents[1],2]/(team.table[opponents[1],2] + team.table[opponents[2],2]) if (runif(1, 0.0, 1.0) < log5) { team.table[opponents[1],3] = team.table[opponents[1],3] + 1 team.table[opponents[2],4] = team.table[opponents[2],4] + 1 if (team.table[opponents[1],1] > team.table[opponents[2],1]) { betterteamwon = betterteamwon + 1; } else { betterteamlost = betterteamlost + 1; } } else { team.table[opponents[1],4] = team.table[opponents[1],4] + 1 team.table[opponents[2],3] = team.table[opponents[2],3] + 1 if (team.table[opponents[2],1] > team.table[opponents[1],1]) { betterteamwon = betterteamwon + 1; } else { betterteamlost = betterteamlost + 1; } } var.rand = var.rand + (log5*(1-log5)/games)/(seasons*teams*games/2) # eliminate these teams from the sub-pool of teams with the same number of games remaining equalgames = equalgames[!equalgames == opponents[1]] equalgames = equalgames[!equalgames == opponents[2]] # if a team has completed its schedule, eliminate it from the pool of teams with games remaining if ((team.table[opponents[1],3] + team.table[opponents[1],4]) == games) { remaining = remaining[!remaining == opponents[1]] } if ((team.table[opponents[2],3] + team.table[opponents[2],4]) == games) { remaining = remaining[!remaining == opponents[2]] } } } } # games = wins + losses team.table[,5] = team.table[,3] + team.table[,4] # observed performance = wins/(wins + losses) team.table[,6] = team.table[,3]/(team.table[,3] + team.table[,4]) for (y in 1:teams) { # store each team's true talent and observed performance true[(x-1)*teams + y] = team.table[y,1] obs[(x-1)*teams + y] = team.table[y,6] } # check whether the most talented team finished with the best record if (team.table[which.max(team.table[,1]),6] == max(team.table[,6])) { best.team.best.record = best.team.best.record + 1 } else { best.team.not.best.record = best.team.not.best.record + 1 } } results = list( seasons = seasons, teams = teams, games = games, # var.true.pop is the user-set variance of talent among teams in the population # for any given sim this will not be the exact var of the teams' talents, since they are a randomly drawn sample var.true.pop = var.true.pop, var.true.logit.pop = var.true.logit.pop, # this enables inspection of all teams' talents and records from the final simulated season final.season.teams = team.table, # this contains the true talents of all the teams from all the seasons simulated true = true, # var.true.samp is the actual variance of talent among teams in the sample of seasons, this may differ from var.true.pop var.true.samp = var(true), # this contains the observed winning percentages of all the teams from all the seasons simulated obs = obs, var.obs = var(obs), # the incrementally calculated var(rand), using actual log5 matchup probabilities var.rand = var.rand, # the p*(1-p)/n estimate of var(rand), with p = .5, assuming all teams are .500 var.rand.est = var.rand.est, # var.true.est is calculated from the formula var(obs) = var(true) + var(rand) var.true.est = var(obs) - var.rand.est, # the Noll-Scully measure of competitive balance noll.scully = sd(obs)/sqrt(var.rand.est), # the number of games into the season at which point team records should be regressed halfway to the mean regress.halfway.games = games*var.rand.est/(var(obs) - var.rand.est), # the percent of games into the season at which point team records should be regressed halfway to the mean # Tangotiger seemed to endorse this as a measure of competitive balance regress.halfway.pct.season = var.rand.est/(var(obs) - var.rand.est), # the percent of games in which the more talented team won better.team.win.pct = betterteamwon/(betterteamwon + betterteamlost), # picking two teams at random, this estimates how often the more talented team will end the season with the better record # the formula comes from dcj (http://sabermetricresearch.blogspot.com/2007/05/large-supply-of-tall-people.html#c481196252540664878) better.team.better.record.pct = 0.5 + atan(sqrt(var(obs) - var.rand.est)/sqrt(var.rand.est))/pi, # how often the most talented team finished with the best record best.team.best.record.pct = best.team.best.record/(best.team.best.record + best.team.not.best.record) ) return(results) } |

Loading this code into R creates a seasonsim() function. We can run the simulation with different values for the number of seasons simulated (“seasons”) (the default is 1000, which takes about a minute to run on my system), the number of teams in the league (“teams”), the number of games each team plays in a season (“games”), and the variance of the true talents of the teams (“var.true.pop”) (this is actual the variance of the talent in the population from which the teams will be randomly sampled). After the simulator is run there is a lot of information we can query to learn about the simulated seasons. Probably the first thing we’d want to look at would be “var.obs,” to see what the variance between the teams in observed performance was.

We can try out the sim using different values for “var.true.pop” until we find a value that results in a “var.obs” that matches the empirical var(obs) values we calculated from actual league records. Recall that the var(obs) for the NFL was .0375. I have been able to approximate that by running the sim with “var.true.pop” set at .0274.

> nfl.sim = seasonsim(teams=32, games=16, var.true.pop=.0274) > nfl.sim$var.obs [1] 0.03751899

Recall that our estimated var(true) (found by subtracting var(rand) from var(obs)) was .0219, not .0274. Why the discrepancy? One problem could be the way var(rand) was estimated. We used the formula .5*(1-.5)/16. This assumes that all games were 50/50 matchups. The closer packed the teams in a league are, the better this assumption will be. But the greater the spread of talent, the smaller the actual variance due to binomial randomness will be (because p*(1-p) is largest when p = .5). To get a better estimate of var(rand), I have attempted to empirically calculate it matchup-by-matchup within the sim. This is the “var.rand” variable, which for the NFL sim comes out to .0128 (as opposed to the .5*(1-.5)/n approximation “var.rand.est,” which is .0156). If we take the var(obs) of .0375 and subtract this .0128 figure, we get a .0247 estimate of var(true). This is closer to the .0274 value that worked best in the simulator.

Another reason why the var(obs) minus var(rand) estimate might be off is because it assumes that team talents are normally distributed around a mean of .500. However, the log5 model requires the team strengths to be between 0 and 1. To deal with this the simulator throws out any samples that contain team talents outside that range, which means talents are really being drawn from a bounded normal distribution. Another option is to use a different distribution which in which all values lie between 0 and 1, such as the beta distribution. In their book Curve Ball, Jim Albert and Jay Bennett use the logit-normal distribution for their simulator. I have made this an option in the sim that can be enabled by setting “use.logit=TRUE.” When this is set the sim uses the variance setting from “var.true.logit.pop” to sample from a normal distribution with mean = 0, and then converts the sampled values to talents by taking the inverse-logit. The result is that the team talents come from the logit-normal distribution. If you want to try this feature out, I have found that the values of “var.true.logit.pop” that yield the closest matches to the empirical var(obs) values for each league are .539 for the NFL, .420 for the NBA, and .062 for MLB (for the basic normal distribution “var.true.pop” values, when not using “use.logit=TRUE,” use .0274, .0219, and .0038, respectively).

**Exploring the Sim**

Let’s try simulating the baseball season. I’ve found that a “var.true.pop” of .0038 works well. The following displays some of the measures that can be returned from running the sim.

> mlb.sim = seasonsim(teams=30, games=162, var.true.pop=.0038) > mlb.sim$var.obs [1] 0.005296506 > mlb.sim$var.rand [1] 0.001498077 > mlb.sim$better.team.win.pct [1] 0.5689354 > mlb.sim$best.team.best.record.pct [1] 0.468

First, we can see that using a var(true) of .0038 yields a var(obs) of .0053, which equals the empirically calculated value. Our original var(obs) minus var(rand) estimate can again be improved by using the better var(rand) value of .00150, as .0053 minus .00150 equals .0038, exactly the var(true) value used in the sim. A new measure is next, “better.team.win.pct.” This says that in 57% of the simulated games, the team with more talent won. This was not a measure that we had an easy formula to calculate without simulation (though I’m guessing one could be derived). Another new one is “best.team.best.record.pct,” which calculates the percent of the simulated seasons in which the most talented team finished with the best record in the league. For this sim I got 47% (this value will vary a good deal between simulations).

What about dcj’s formula for “better.team.better.record.pct” – can we test that out by simulation? In the main sim I just included the formula version, but here’s a separate R function that can be used after you’ve run a simulation to check the validity of the formula. It picks out pairs of teams at random and checks whether the more talented one finished with the better record. You can specify the number of pairs you want compared (the “draws”), and point it to the observed winning percentages (“obs”) and true talents (“true”) produced by the main sim.

^{?}View Code RSPLUS

better.team.better.record.sim = function(draws=1000, obs, true) { betterbetter = 0 betterworse = 0 for (i in 1:draws) { teams = sample(1:length(obs), size = 2) if (obs[teams[1]] > obs[teams[2]]) { if (true[teams[1]] > true[teams[2]]) { betterbetter = betterbetter + 1 } else { betterworse = betterworse + 1 } } if (obs[teams[1]] < obs[teams[2]]) { if (true[teams[1]] < true[teams[2]]) { betterbetter = betterbetter + 1 } else { betterworse = betterworse + 1 } } } return(betterbetter/(betterbetter + betterworse)) } |

We can use this to compare the formula value to the simulated value, and the close match suggests the formula works well:

>mlb.sim$better.team.better.record.pct [1] 0.818068 better.team.better.record.sim(obs=mlb.sim$obs, true=mlb.sim$true) [1] 0.821281

**Some Conclusions**

There is a lot more that can be learned from using the simulator, even though it is a very simple model of a sports league. Let me know if you find anything interesting (or find any bugs).

Using the sim we were able to come up with even more measures of competitive balance. It may be useful to divide these measures into two groups. One group is made up of the measures that describe the distribution of talent between teams in a league, independent of how many games are played in a season. Examples from this group include var(true) and “better.team.win.pct.” These are the measures that yield the same value no matter what you set the “games” value to in the sim (assuming you hold all the other parameters constant). The other group is formed by the measures that account for the interaction between the distribution of talent and the length of the season. Examples from this group include “regress.halfway.pct.season,” the Noll-Scully measure, “better.team.better.record.pct,” and “best.team.best.record.pct.” There’s no one correct way to compare the competitive balance of different sports leagues, but by using the simulator to look at a variety of measures one can better understand how the leagues differ.

As for the var(obs) minus var(rand) method of estimating var(true), I would say that it holds up pretty well. Even if team talents aren’t normally distributed in all sports, assuming they are seems to work pretty well. That said, more research needs to be done, perhaps using more advanced simulation models. Hopefully others can build off of the starting point that I have provided in this post.

October 21st, 2010 at 3:28 pm

Some of your charts are blocked at work for me, so this may already be shown above.

But I have one question.

What is the equivalent number of NBA games to MLB’s 162 in terms of determining true talent?

Thanks.

October 21st, 2010 at 4:09 pm

Here’s one way to figure that. At 67 MLB games, you would regress halfway to the mean, which means you would add 67 games worth of .500 ball to the team’s actual record. This applies at any point in the season. So at 162 games, you would average 162 games worth of the team’s record with 67 games worth of .500, meaning the team’s record would be weighted 162/(162+67) percent, or 71%. So now we just have to find at what number of NBA games you would weight actual record 71%. At 13 games record is weighted 50%, meaning you add 13 games worth of .500 basketball. Solving x/(x + 13) = .71, we get 32 games. So 32 games into the NBA season is equivalent to a full baseball season in terms of how much we know about a team’s talent.

October 21st, 2010 at 4:32 pm

Very nice work. I can see you’ve been tracking discussion of this issue for some time!

I’m not clear on which part of my critique of Noll-Scully you disagree with. Perhaps we just have different things in mind in terms of measuring “competitive balance.” I think most fans (though perhaps not most economists) think of it in terms of the distribution of team talent/strength, which is your first category. And even for some economic analysis, talent distribution is what’s relevant. Dave Berri, for example, uses it as evidence for his theory of “the short supply of tall people” which is about the distribution of player skill, not season length. For these purposes, I think we agree that Noll-Scully doesn’t work at all.

But I agree that there’s a place for metrics that deal with both talent and season length. Essentially, these metrics tell us how much luck is involved in season records in a league, given a certain talent distribution. Personally, I think of these as measures of season length, rather than “competitive balance.” And in that category, I’d suggest an alternative to “regress.halfway.pct.season,” which is percent.outcome.skill. Rather than figure out how many games gets you to 50% of variance coming from skill, tell us what that percentage is given the current season length. Maybe that figure is 65% for the NFL, and 90% for the NBA (just guesses).

Final thought: metrics that combine skill variance and season length have a problem, which is that they treat all uncertainty of result as being the same. But uncertainty due to parity of talent and uncertainty due to short seasons are, from a fan perspective, radically different. Both make us unsure of who will win, but morally they are opposites. We want some rough parity of talent, but sports justice also requires the better team to win. Creating uncertainty by making teams more equal is, usually, a good thing; creating uncertainty by allowing weaker teams to beat better teams is, usually, a bad thing. So I think any metric that treats them as equally good sources of “competitive balance” is not often going to be illuminating.

(But again, if used only to tell us how well a league schedule does at identifying real talent differences, that’s useful.)

October 21st, 2010 at 5:04 pm

One other thought: Noll-Scully as a metric is totally non-intuitive. Was does it mean that the NFL is 1.55 and NBA is 2.69? Those numbers don’t tell me anything (unlike some of your suggested alternatives). Is the NBA 73% more (or less) competitive than the NFL? Or is it 1.14 more/less competitive? (and 1.14 whats?) I just don’t think the numbers mean anything, even if they can be converted into useful metrics.

October 21st, 2010 at 7:40 pm

Guy, thanks for the comments. I think we’re basically in agreement. You’re original critiques seemed to me to be dismissive of Noll-Scully as if it were irreparably flawed because of its construction, but I wanted to emphasize that for some purposes it is exactly the kind of measurement we’re looking for. I agree that it’s not on an intuitive scale.

October 22nd, 2010 at 9:39 am

Another thing to remember is that when we talk about the distribution of talent between teams, we are always talking about talent within the context of the rules of the particular sport. So when we see that in a random game the better teams wins 57% of the time in baseball but 66% of the time in basketball, this is partly due to the way talent is spread between teams, but it is also partly due to the fact that a single baseball game has far fewer scoring opportunities than a basketball game, and thus a lesser team has a better chance of winning due to randomness.

October 22nd, 2010 at 10:04 am

Right. An historically great NBA team will outscore it’s opponents by about 10%, but in baseball a 10% advantage won’t even get you into the playoffs (usually).

BTW, I tried explaining these issues to some economists over at the Sports Economist a while back, with no success at all. See the comments at this post: http://thesportseconomist.com/2007/10/more-on-james-and-balance.htm. Either I explained it poorly, or there is something about economists’ training that prevents them from understanding this stuff.