## Tuesday, May 27, 2008

### Methods and Statistics Explained: Part II

Part II: Computer Modeling

There is little more controversial in sports than the proper method to determine strength of schedule. Luckily, most leagues use pure wins and losses to determine entrance into playoffs and the seeds therein. In college sports, the sheer number of teams and wide range of schedule difficulties prohibit the win-loss percentage from being the sole criterion to determine the best teams.

Major League Baseball proper won't need strength of schedule any time soon. But for those of us concerned with predicting future performance, especially playoff performance, knowing a team's strength against its schedule is of paramount importance, since it differentiates teams so clearly.

A key concept to understand is retrodictive ratings versus predictive ratings. Retrodictive ratings only care about wins and losses, and pretty accurately show which teams have won the most based on the difficulty of their schedule (which is also figured out based only on wins and losses).

Predictive ratings, on the other hand, care only about runs scored and allowed. This run ratio is an accurate measure of how well a team will perform in the future, and ignores whether a team actually won the ballgame. A team's difficulty of schedule is calculated using run ratios of opponents and not wins/losses.

It's clear that both of these methods have a good deal of value for rating teams, and that neither should be used exclusively. Statisticians like Jeff Sagarin synthesize retrodiction and prediction to create a meaningful halfway-point between past and future performance, and Baseball Playoffs Now follows this example.

Rating Method

The algorithm for rating a team is very simple:

Home Team Rating + Home Field Advantage - Away Team Rating = average difference between the two teams over the entire season

The difficulty, of course, is that teams perform differently from one day to the next, and from one opponent to the next. There is no perfect rating for any team that explains exactly what happened during any given ballgame. The Padres can beat the Dodgers by 5 today and lose by 3 tomorrow, while the Dodgers beat the Giants twice by 7, and the Giants beat the Padres by 5 and lose by 2.

There is an optimal point somewhere between all of the scores that describes the average performance of San Diego, San Francisco, and Los Angeles against each other. The computer's goal is to find that point and minimize the error. Baseball Playoffs Now's algorithm constantly adjusts each team's rating (which then ripples out throughout Major League Baseball, since every team is connected to every other team through a formula for every game) and finds the point at which there is the least MLB-wide error. At that point, we have the optimal ratings for today.

Running this system twice (once with run ratios, once with wins/losses only), we create two ratings for each team: predictive and retrodictive. Synthesizing them creates an important part of Baseball Playoffs Now's overall rating for each team. You can see below the difference between today's predictive and retrodictive ratings for each team. I have starred teams with 8 or more rank differences between the two ratings.

Computer Models (predictive = run ratios, retrodictive = wins/losses)

ARI - predictive # 8 - retrodictive # 13
ATL - predictive # 6 - retrodictive # 14 *****
BAL - predictive # 18 - retrodictive # 11
BOS - predictive # 7 - retrodictive # 2
CHC - predictive # 1 - retrodictive # 7
CHW - predictive # 3 - retrodictive # 4
CIN - predictive # 24 - retrodictive # 22
CLE - predictive # 11 - retrodictive # 23 *****
COL - predictive # 28 - retrodictive # 28
DET - predictive # 20 - retrodictive # 26
FLA - predictive # 17 - retrodictive # 9 *****
HOU - predictive # 14 - retrodictive # 8
KAN - predictive # 26 - retrodictive # 24
LAA - predictive # 10 - retrodictive # 3
LAD - predictive # 16 - retrodictive # 19
MIL - predictive # 23 - retrodictive # 18
MIN - predictive # 22 - retrodictive # 16
NYM - predictive # 13 - retrodictive # 21 *****
NYY - predictive # 15 - retrodictive # 17
OAK - predictive # 2 - retrodictive # 6
PHI - predictive # 4 - retrodictive # 12 *****
PIT - predictive # 19 - retrodictive # 20
SDG - predictive # 30 - retrodictive # 30
SEA - predictive # 29 - retrodictive # 29
SFO - predictive # 27 - retrodictive # 27
STL - predictive # 12 - retrodictive # 10
TAM - predictive # 5 - retrodictive # 1
TEX - predictive # 21 - retrodictive # 15
TOR - predictive # 9 - retrodictive # 5
WAS - predictive # 25 - retrodictive # 25