Obviously the big news this year (in the area of machine prediction of the NCAA Tournament, anyway) is the Kaggle competition for $15,000. Oh, and there's the Quicken Loans competition for a $1 Billion. But if that's not enough to keep you busy this March, I'm pleased to announce the continuation of the Machine March Madness competitions. There's no money at stake, but this is the longest-running machine prediction competition, and let's face it -- if you're going to enter the Kaggle competition you might as well enter Machine March Madness, too. It's not much more work :-).
(My thanks to Danny and Lee for letting me keep the competition running.)
The rules are very informal. Your predictions must be based on a computer algorithm, but you can implement some parts manually as long as they're objective. For example, your method might include "Take the team with the higher Sagarin rating" which you just handled manually, but please limit these steps and avoid just using your subjective judgement. You can use any data you can find, including human-generated rankings like the AP poll.
The competition will be run as a Yahoo! Pool called "Machine March Madness" which you can find here. Scoring will be Fibonacci -- 2-3-5-8-13-21 -- which will make the competition a little bit less dependent on the final round(s) than the traditional scoring. To get the password to join the pool, email me (firstname.lastname@example.org) with the name of your entry and a short description of your approach. Also, please join the Google Group for announcements and discussion.
Useful data can found in a couple of places. First, at the Kaggle competition data page. Secondly, you can look in this Google Group thread from last year for some pointer's to last year's data. Finally, I have fairly extensive data and will make it available as needed -- email me (or post in the Google Group) what you'd like to see.
Danny Tarlow's starter code from past years can be found here. A short tutorial I wrote on using RapidMiner to predict games can be found here. Finally, there have been several useful postings on ratings systems and predictions in the Kaggle forum.