Friday, May 13, 2011

ELO

Several of the rating systems we have looked at so far borrow ideas from the ELO Chess Rating system.  The inventor of this rating system, Arpad Elo, was a master-level chess player who invented the system to enable fairer rankings of chess players.  ELO has since been used for many other two-player games, and Jeff Sagarin uses "ELO-CHESS" as one of his rating systems for basketball and football.  It is also one component of the Bowl Championship Series (BCS).

The version of ELO that I have implemented for testing (based upon the explanation given here) is defined by this formula:
N(t) = R(t)+K*(S(t)-E(t))

N(t) = the new rating for team t
R(t) = the old rating for team t
K = a maximum value for increase or decrease of rating
S(t) = the outcome of the game for team t
E(t) = the expected outcome of the game for team t
Note that (because it does not base a team's rating recursively on the ratings of any other team) ELO does not use an iterative solution like ISR or Wilson.  And unlike RPI, changing the ELO rating for a team does not affect the rating of any other team.  So it is very simple and fast to calculate.

Since there are no draws in basketball (and since we're ignoring MOV), the outcome of a game for team t [S(t)] is 1 if team t won the game, and 0 otherwise.  The heart of ELO system is determining the "expected outcome" for a game (the term E in the update equation above).  This is a number between 0 and 1 indicating how likely team t was to win this game based upon the team's current ELO rating and the opponent's current ELO rating.  ELO assumes that performance is a normally distributed random variable, and that each player has the same standard deviation.  As a result, E(t) is defined as this:
E(t) = 1/[1+10^([R(o)-R(t)]/400)]
where team o is the opponent in the current game.  (The "400" in this equation is an historical artifact.)

The only variable in the ELO formula is "K," the maximum update to the rating allowed from one game.  In chess this is typically 16 or 32 (depending upon the skill of the player).   For our purposes, we can test a range of values and look for one that maximizes performance for college basketball.

Here are the results of testing ELO with our usual methodology:

  Predictor    % Correct    MOV Error  
Wilson77.7%10.33
ELO (K=16)71.6%11.77
ELO (K=32)71.6%11.67
ELO (K=64)71.8%11.59
ELO (K=100)71.4%11.60
ELO (K=200)70.7%11.76

Performance seems to peak around K=64, but even at its best is significantly short of the best performing rating so far.  It is also significantly less accurate than the Trueskill rating (which is also based on Bayesian reasoning).

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.