## Monday, March 2, 2015

### Reasonable Kaggle Performance

The first stage of the Kaggle competition involves Kagglers testing out their models against data from the past few basketball seasons, and these scores appear on the first stage leaderboard.  Invariably new Kagglers make some fundamental mistakes and end up submitting entries with unreasonably good performance.  The administrators of the contest have taken to removing these entries to avoid discouraging other competitors.  The line for removing entries is somewhat fuzzy, and it begs the question1 "What is a reasonable long-term performance for a Tournament predictor?" There are probably many ways to answer this question,2 but here's one approach that I think is reasonable:  Calculate the performance of the best possible predictor over an infinite number of Tournaments.

I am reminded at this point of an old joke.
A man is sitting in a bar complaining to his friend -- who happens to be a physicist -- about his awful luck at the racing track, and wishing he had some better way to know what horse was going to win each race.

"Well, that strikes me as a rather simple physics problem," his friend says.  "I'm sure I could build a model to predict the outcome."

"Really?" says the man, visibly excited.  "That's fantastic.  We'll both get rich!"

So the physicist goes off to build his model.  After a week, the man has still heard nothing, so he calls his friend.  "How are you doing on the model?" he asks.

"Well," says the physicist.  "I admit that it is turning out to be a bit more complicated than I imagined.  But I'm very close."

"Great," says the man.  "I can't wait!"

But another week goes by and the man hears nothing, so he calls again.

"Don't bother me," snaps the physicist.  "I've been working on this day and night.  I'm very close to a breakthrough!"

So the man leaves his friend alone.  Weeks pass, when suddenly the man is awakened in the middle of the night by a furious pounding on his front door.  He opens the door and sees his friend the physicist.  He looks terrible -- gaunt and strained, his hair a mess -- and he is clutching a sheaf of crumpled papers.  "I have it!" he shouts as the door opens.  "With this model we can predict the winner of any horse race!"

The man's face lights up.  "I can't believe you did it," he says.  "Tell me how it works."

"First of all," says the physicist, "we assume that the horses are perfect spheres racing in a vacuum..."
Like the physicist, we face a couple difficulties.  For one thing, we don't have the best possible predictor.  For another, we don't have an infinite set of Tournaments.  No matter, we shall push on.

We don't have the best possible predictor (or even know what its performance would be) but we do have some data from the best known predictors and we can use that as a substitute.  The Vegas opening line is generally acknowledged to be the best known predictor (although a few predictors do manage to consistently beat the closing line, albeit by small margins).  The Vegas opening line predicts around 74% of the games correctly "straight up" (which is what the Kaggle contest requires). I'm personally dubious that anyone can improve upon this figure significantly3 but for the sake of this analysis let's assume that the best possible predictor can predict an average game4 correctly 80% of the time.

We also don't have an infinite number of Tournaments to predict, but we can assume that the average score on an infinite number of Tournament games will tend towards the score on an average Tournament game.  For the log-loss scoring function, the best score in the long run comes from predicting our actual confidence (the 80% from above).  If we predict an infinite number of games at 80% and get 80% of those games correct, our score is:

0.80*log(0.80) + (1-0.80)*log(1-0.80)

which turns out (fortuitously) to be just about 0.50.  (If we use a performance of 74%, the score is about 0.57.)

This analysis suggests that the theoretical best score we can expect predicting a large number of Tournament games is around 0.50 (and probably closer to 0.57).  This agrees well with last year's results -- the winner had a score of about 0.52 and the median score was about 0.58.

As far as "administrative removal" goes, there are 252 scored games in the Kaggle stage one test set.  That's not an infinite set of games, but it is enough to exert a strong regression towards the mean.  The Kaggle administrators are probably justified in removing any entry with a score below 0.45.

On a practical level, if your predictor is performing significantly better than about 0.55 for the test data, it strongly suggests that you have a problem.  The most likely problems are that you are leaking information into your solution or that you are overfitting your model to the test data.

Or, you know, you could be a genius.  That's always a possibility.

1 Yes, I know I'm misusing  "beg the question".
2 I suspect that a better approach treats the games within the Tournament as a normal distribution and sums over the distribution to find the average performance, but that's rather too much work for me to attempt.
3 If for no other reason than Vegas has a huge financial incentive to improve this number if they could.
4 The performance of the Vegas line is an average over many games.  Some games (like huge mismatches) the Vegas line predicts better than 74%; some (like very close matchups) it predicts closer to 50%.  I'm making the simplifying assumption here that the average over all the games corresponds to the performance on an average game.  Later on I make the implicit assumption that the distribution of Tournament games is the same as the distribution of games for which we have a Vegas line.  You can quarrel with either of these assumptions if you'd like.  A quick analysis of the Tournament games since 2006 shows that the Vegas line is only right 68% of the time, suggesting that Tournament games may be harder to predict than the average game.

## Friday, February 27, 2015

### Five Mistakes Kaggle Competitors Should Avoid

#1 -- Don't think you're going to win the competition.

One of the results that came out of the analysis of last year's contest is that the winner was essentially random:  at least the top 200 entries could have credibly won the contest.  Why?  Evidence from the best predictors suggests that there is about 8 points or so of unpredictability in college basketball games.  That's a lot of randomness.  Last year, 32 of the 64 games in the Tournament were decided by 8 points or less.  So even if you have the most accurate predictor in the contest, you're almost certain to be beaten by someone who made a worse prediction and got lucky when it came true.  It's the same reason why the ESPN pool is usually won by an octopus or someone who picked based on mascot fashions. On the other hand, maybe this year you'll be the guy who gets lucky.  It could happen.

#2 -- Don't use the data from before the 2008-2009 season.

Isn't it nice of the Kaggle administrators to provide data back to 1985?

If you're not familiar with college basketball, you might not realize that the college game underwent a radical change at the beginning of the 2008-2009 season when the NCAA instituted the three-point shot at a consistent distance of 20' 9".  The three-point shot created whole new game strategies, and data from before that season is probably not easily applicable to today's game.

#3 -- The Tournament data is not enough for training or testing.

At 64 games a year, the Tournament just doesn't provide enough data for training or even testing a predictor with any reliability.  You may think you're being smart to build your model specifically for the Tournament -- imagine the advantage you'll have over all the other competitors that don't understand how different the Tournament is from the regular season.  Ha!

But actually you're just overfitting your model.   My own predictor needs about 15,000 training examples for best performance.  Your mileage may vary -- maybe you only need 14,000 training examples -- but there just isn't enough information in the Tournament games alone to do accurate prediction.  Particularly since you shouldn't use the games from before 2008 (see #2).  Of course, you can do all that and you might still win the contest (see #1).

#4 -- Beware of leakage!

Guess what?  It turns out that you can do a really good job of predicting the Tournament if you know the results ahead of time.  Who knew?

Now that's not a big problem in the real contest because (short of psychic powers) no one knows the results ahead of time.  But if the forums from last year and this year are any indication, it's a big problem for many Kagglers as they build and test their predictors.  Knowledge from the games they're testing creeps into the model and results in unrealistically good performance.

 A First-Time Kaggle Competitor
There are three major ways this happens.

The first and most obvious way this happens is that a model is trained and tested on the same data.  In some cases you can get away with doing this -- particularly if you have a lot of data and a model without many degrees of freedom.  But that isn't the case for most of the Kaggle models.  If you train your model on the Tournament data and then test it on the same data (or a subset of the data), it's probably going to perform unreasonably well.  You address this by setting aside the test data so that it's not part of the training data.  For example, you could train on the Tournament data from 2008 to 2013 and then test on the 2014 Tournament.  (Although see #3 above about using just the Tournament data.)  Cross-validation is another, more robust approach to avoiding this problem.

The second way this often happens is that you unwittingly use input data that contains information about the test games.  A lot of Kagglers use data like Sagarin's ratings without understanding how these statistics are created.  (I'm looking at you, Team Harvard.)  Unless you are careful this can result in information about the test games leaking back into your model.  The most common error is using ratings or statistics from the end of the season to train a model for games earlier in the season.  For example, Sagarin's final ratings are based upon all the games played that season -- including the Tournament games -- so if you use those ratings, they already include information about the Tournament games.  But there are more subtle leaks as well, particularly if you're calculating your own statistics.

How can you tell when you're suffering from leakage?  Your performance can provide an indicator.  Last year's winner had a log-loss score of 0.52, and the median score was around 0.58.  If your predictor is getting performance significantly better than those numbers, then you're either (1) a genius, or (2) have a problem.  It's up to you to decide which.

#5 -- A Miscellany of Important Notes
• College basketball has a significant home court advantage (HCA).  (And yes, there may be a home court advantage in Tournament games!) Your model needs to account for the HCA and how it differs for neutral court and Tournament games.  If your model doesn't distinguish home and away, you've got a problem.
• College teams change significantly from season to season.  You can't use a team's performance in one season to predict its performance in another season.  (This seems obvious, but last year's Harvard team seems to have made this mistake.  On the other hand, they got a journal publication out of it, so if you're an academic this might work for you too.)
• Entering your best predictions might not be the best way to win the contest.  Since the contest has a large random element (see #1 above) your best strategy might be to skew your predictions in some way to distinguish yourself from similar entries, i.e., you should think about meta-strategy.

## Wednesday, February 25, 2015

### JQAS Paper Reviews

Some of you who participated in last year's Kaggle contest may remember that the Journal of Quantitative Analysis in Sports (JQAS) solicited papers on the methods contestants used to predict basketball game outcomes in the NCAA tournament as part of the Kaggle contest.  The next issue of JQAS will contain five papers that resulted from this solicitation and the publisher has made the papers freely downloadable for a month after the issue is published as well as while they are posted in the "Ahead of Print" section on the JQAS site.  (I have also added them to my Papers archive.)  Below are short reviews of the five papers.

Michael J. Lopez  and Gregory J. Matthews, "Building an NCAA men's basketball predictive model and quantifying its success."

Lopez and Matthews won the 2014 Kaggle Contest.  The paper describes their approach as well as an analysis of how "lucky" their win was.

Lopez and Matthews employed a two-pronged prediction approach based upon (1) point spreads and (2) efficiency ratings (from Pomeroy).  They built separate regression models for points spreads and the efficiency ratings and combined them in a weighted average for their two submissions:  One that weighted point spreads at 75% and efficiency ratings at 25%, and one vice versa.  Since point spreads were only available for the first 32 games, Lopez & Matthews estimated the point spreads for the remaining games using an undescribed "proprietary" model.

Lopez & Matthews also analyzed how "lucky" they were to win the contest.  Their analysis suggests that luck is by far the biggest factor in the competition.  For example, they found that about 80% of the entries could have won the contest under different reasonable outcomes, and the true probability of their entry being the best was less than 2%.
Commentary:  While I appreciate that Lopez & Matthews took the time to write up their experience, I find myself disappointed that this approach ended up winning; it brings nothing novel or interesting to the problem.  Their analysis in the second part of the paper is interesting -- it confirmed my belief that the Kaggle contest was essentially a random lottery amongst the top few hundred entries.
Ajay Andrew Gupta, "A new approach to bracket prediction in the NCAA Men’s Basketball Tournament based on a dual proportion likelihood"

In this paper, Gupta describes his approach to predicting the Tournament games and also does some analysis of bracket strategy under traditional (1, 2, 4, 8, 16, 32) scoring.

Gupta's prediction approach is complex.  It involves team ratings based upon maximum likelihood and what Gupta terms a "dual proportion" model.  I won't attempt to summarize the math here -- it requires several appendices in the paper itself to describe -- the interested reader should consult the paper.

In the second half of the paper, Gupta addresses how to compose a tournament selection to do well in a traditional bracket competition.  His conclusion is to pick a high-probability upset for one to three late round games.
Commentary:  This paper is poorly written and confusing from start to finish.  I'm frankly very surprised that it was chosen for publication.

One of the major problems is that uninteresting or unoriginal ideas are inflated with confusing descriptions.  For example, the paper presents the "dual proportion model" as a novel new approach.  So what is the "dual proportion model"?  "Each of the two teams in a game has a probability of winning the game, and these must add up to 100%."  That's hardly worthy of mention, much less to be held up as a new insight.
Another major problem is the long list of unsupported assumptions throughout the model:  a scaling parameter beta "that applies to big wins, meaning at least 10 points" (Why scale big wins?  Why is 10 points a big win?), "However, [log-likelihood's] shape is better for bracket prediction." (Why is it better?)  "Some wins are more indicative of future wins than others are."  (Really?  What wins?  Why?)  "Point differences can also be deceptive..."  (What is your proof of this?)  "The strength-of-schedule adjustment works by reducing the strengths of the non-tournament teams in a weak conference."  (Why?)  There are many more examples.  None of these various assumptions are given any more than a vague explanation, and worse, none are tested in any way.  The result is a pastiche of unexplained, untested ideas that likely have little or no value.
One final nitpick is that this paper doesn't seem to have anything to do with the Kaggle competition, and all of its analysis is based upon the more standard pool scoring methods.
Andrew Hoegh, Marcos Carzolio, Ian Crandell, Xinran Hu, Lucas Roberts, Yuhyun Song and
Scotland C. Leman, "Nearest-neighbor matchup effects: accounting for team matchups for predicting March Madness"

In this paper, Hoegh (et al) augment a standard strength rating-based predictive system with relative adjustments based upon how each team in the matchup has performed in similar past matchups.  So, for example, if a team is playing a very tall, good rebounding team, the model will look at the team's past performances against very tall, good rebounding teams and see if they played particularly well (or particularly poorly) against these sorts of teams in the past, and then apply that adjustment to predicting the current game.
Commentary:  This paper is well-written and presents an interesting and novel idea.  The notion of adjusting a general prediction to account for a particular matchup is at least intuitively appealing, and their approach is straightforward and easily defensible.  There are a couple of interesting issues to think about in their scheme.

First of all, how should you find past games for calculating the matchup adjustment?  Since you're trying to improve a generic strength measurement, I'd argue that ideally you'd like to find past games using some factors that aren't already reflected in the strength metric.  (Otherwise you're likely to just reinforce the information already in the strength metric.)  In this case, the authors find similar past games using a nearest-neighbor distance metric based upon twenty-one of Pomeroy's team statistics.  Some of these statistics do seem orthogonal to the strength metric (e.g., Effective Height, Adjusted Tempo) but others seem as if they would be highly correlated with the strength metric (e.g., FG percentage).  I would be interested to see some feature selection work on these statistics to see what statistics perform best on finding past games.

Second of all, testing this scheme is problematic.  The authors note that the scheme can really only be applied to the Tournament (or at least late in the season) when teams have played enough games that there's a reasonable chance to find similar past matchups.  In this case the authors have tested the scheme using Tournament games but only (if my reading is correct) looking in detail at the 2014 results.  That shows some positive benefits of the scheme, but 65 games is just too small a sample size to draw any conclusions.

Overall, I'm a little dubious that individual matchup effects exist, and that you can detect them and exploit them.  For one thing, if this were true I'd expect to see some obvious evidence of that in conference play, where teams play home-and-home.  For example, you might expect that if Team A has a matchup advantage over Team B that it would outperform expectations in both the home and away against Team B.  I haven't seen any evidence for that sort of pattern.  I've also looked at individual team adjustments a number of times.  For example, you might think that teams have individual home court advantages -- i.e., that Team A has a really good home crowd and does relatively better at home than other teams.  But I've never been able to find individual team adjustments with predictive value.  Sometimes teams do appear to have an unusually good home court advantage -- I recall a season when Denver was greatly outperforming expectations at home for the first part of the season.  But it (almost?) always turns out to be random noise in the data -- Denver's great home performance in the first part of the season evaporated in the second half of the season.
So this paper would have benefited from some more rigorous attempts to verify the existence and value of matchup effects, but it nonetheless presents and interesting idea and approach.
Lo-Hua Yuan, Anthony Liu, Alec Yeh, Aaron Kaufman, Andrew Reece, Peter Bull, Alex Franks, Sherrie Wang, Dmitri Illushin and Luke Bornn, "A mixture-of-modelers approach to forecasting NCAA tournament outcomes."

This paper discusses a number of predictive models created at Harvard for the 2014 Kaggle competition.  The final models included three logistic regressions, a stochastic gradient descent model, and a neural network.  Inputs to the models were team-level statistics from Pomeroy, Sonny Moore, Massey, ESPN and RPI.  The models were also used to build ensemble predictors.
Commentary:  This paper presents a very ordinary, not very interesting approach.  (I suspect that the Kaggle competition was used as an exercise in an undergraduate statistics course and this paper is a write-up of that experience.)  The approach uses standard models (logistic regression, SGD, neural networks) on standard inputs.  The model performances are also unusually bad.  None of the models performed as well as the baseline "predict every game at 50%" model.  Even a very naive model should easily outperform the baseline 0.5 predictor.  That none of these models did suggests very strongly that there is a fundamental underlying problem in this work.

The paper also spends an inordinate amount of time on "data decontamination" -- by which the authors mean you can't use data which includes the Tournament to predict the Tournament.  I realize that many Kaggle participants trying to use canned, off-the-shelf statistics like Pomeroy fell into this trap, but it's frankly a very basic mistake that doesn't warrant a journal publication.  The paper also makes the mistake of trying to train and test using only Tournament data.  The authors acknowledge that there isn't enough data in Tournament games for this approach to work, but persist in using it anyway.
Francisco J. R. Ruiz and Fernando Perez-Cruz, "A generative model for predicting outcomes in college basketball."

This paper extends a model previously used for soccer to NCAA basketball.  Teams have attack and defense coefficients, and the expected score for a team in a game is the attack coefficient of the team multiplied by the defense coefficient of the opponent team.  This basic model is extended first by representing each team as a vector of attack and defense coefficients, and secondly representing conferences as vectors of attack and defense coefficients as well.  The resulting model finished 39th in the 2014 Kaggle competition.  The authors also assess the statistical significance of the results of the 2014 Kaggle competition and conclude that 198 out of the 248 participants are statistically indistinguishable.  This agrees with the similar analysis in the Lopez paper.

Commentary: The approach used in this paper is similar to the one used by Danny Tarlow in the 2009 March Madness contest, although with a more mathematically sophisticated basis.  (Whether that results in better performance is unclear.)  The authors give an intuitively appealing rationale for using vectors of coefficients (instead of a single coefficient) to represent teams: "Each coefficient may represent a particular tactic or strategy, so that teams can be good at defending some tactics but worse at defending others (the same applies for attacking)."  It would have been helpful to have a direct comparison between a model with one coefficient and multiple coefficients to see if this in fact has any value.  Similarly, the idea of explicitly representing conferences has some appeal (although it's hard to imagine what reality that captures) but without some test of the value of that idea it remains simply an interesting notion.  Although the basic ideas of this paper are interesting, the lack of any validation is a major weakness.

## Friday, February 20, 2015

### Kaggle Competition: From Point Spreads to Win Percentage

The Kaggle Competition asks competitors to estimate the win probabilities for all the possible Tournament games.  But for reasons that I'm sure have nothing to do with gambling, many systems -- including mine -- predict Margin of Victory (MOV) rather than probability of winning.  So how does one convert a predicted MOV to a win probability?

I started by creating a histogram of predicted MOV versus win probability for 31K games in my training set.  (I used predictions from my own system, but you could also do this with the opening or closing Vegas lines.)  I binned at 1 point intervals to get the following graph:

This shows that no team predicted to lose by 18 or more points ever won a game, that teams predicted to win by 0 points won half the time, and that teams predicted to win by 25 or more points won every time.  That seems pretty reasonable, and it's reassuring that the graph crosses zero at about 50 percent.

I could use this data directly to translate from predicted MOV to win probabilities.  If I predict a team is going to win by 10 points I could use this chart to see that its win probability is 83.1% and use that in my Kaggle entry.

There are a couple of minor problems with this.  First, even with 31K games there's some obvious noise in the data, particularly at the tail ends of the ranges.  Second, I'm allowed two Kaggle entries, so I might want to create my second entry by tweaking this curve.  That will be hard to do working with the raw data like this.  For these reasons, I'd like a formula for mapping from predicted MOV to win probability.

A simple solution is to do a linear regression on the middle part of the graph.

The result is a pretty good fit.  This also reveals there's a little bias in my predictions.  If the predictions were perfectly unbiased the constant term in this equation would be 0.50.

However, there's a pretty obvious S shape to this curve that the linear equation is not capturing.  I could fit with higher order polynomial (a fourth order equation fits almost perfectly) but there's good reason to believe that what we're really seeing here is a cumulative normal distribution.  (That's the familiar bell-shaped curve -- if I were to plot this as a difference between the predicted MOV and the actual MOV that's exactly what we'd see.)

So let's try fitting a cumulative normal distribution to the data.

That's a pretty good-looking match.  I did this just by eyeballing the data and picking 0 as the mean and 10 as the standard deviation, but if you want more precision you can do some more complex analysis to get a better fit.  (Not unsurprisingly, this corresponds very closely with my mean bias and RMSE for this season.)

Whether you use a simple linear equation or something more complex, you can now tweak your equation to create a new strategy.  For example, if I tweak my normal distribution to have a standard deviation of only 6, I get a new curve:

The effect of using this curve is to increase the confidence in my picks -- essentially to gamble that I'm going to be right more than I have been in the past.

## Monday, February 16, 2015

### Guest Post: Todd Beck (ThePredictionTracker.com)

Today is the second in what I hope will be a series of guest postings.  I started submitting picks this year to The Prediction Tracker, and I became curious about Todd Beck, who runs the site, so I sent him a series of questions and I'm pleased to present his responses!  -- Scott

I grew up in a small town south of Dallas Texas.   I attended Texas A&M University where I got a undergrad degree in Math and a graduate degree in Statistics.    For some reason I had a interest in trying to predict football games even way back when I was a kid.  I used to have a notebook where I guessed at every NFL game for years.  So that life-long interest and the fact that our statistics department had a college football pool got me started in coming up with an automated way to make predictions for any game.

For the 19 years since leaving college I have been a statistician at a hospital/medical school in Chicago, primarily in the research area of Alzheimer's disease and other aging related issues.

Most of the people who get involved in sports predictions are sports fans.  Do you have a rooting interest in any particular teams?  What's your favorite sport?

I was definitely a sports fan growing up.  Being close to Dallas I grew up a fan of the Cowboys (Boo! -- Scott), Rangers, and Mavericks.   Even though I have lived in Chicago for a long time now I have never been able to get into any of there teams and haven't been able to follow baseball or basketball too much, but I remain a diehard Cowboys fan.

My favorites sports to play was always baseball for some reason.   These days I really only follow pro and college football.

ThePredictionTracker.com provides a tremendous service to prediction community, but it seems like it must be a ton of work.  What's your workflow like and how much time do you spend each week on it?  I know you run ads on your site -- does that recoup your costs?  How about your donations button -- does that bring in much support?

Because a majority of the things on thepredictiontracker.com are automated it doesn't require a lot of actual hands on time. The most time is always in the first couple of weeks of a season as I look to see which systems are going to continue on and potentially adding any new ones.    I have basketball completely automated so it really takes no time at all once it is running.  Football is a little different because it is a weekly event.  Collecting the data is automated but I still manually run the program each day.  So Monday usually needs a little more time to set up the new week of college games and Tuesday more time to set up for the NFL.

Initially I was never in favor of having ads.  But I started on a free web server but I was soon hitting the bandwidth limits.  So I switched to a different service and before long I started hitting limits there too.  So if I wanted to continue the site I had to get a permanent domain name and start paying for the server.  My traffic has continued to grow ever year and so the ad revenue has grown with it.

Originally with ads plus donations I was making just about the same amount as it was costing me, only now with a better ad service and with a larger audience am I bringing in more.  But it is definitely not enough to quit the day job.

You don't seem to have a predictor or ranking system of your own.  Do you use the information you gather on ThePredictionTracker.com yourself?  How successful has that been?

I do not have any basketball systems of my own but I do several for football.  My completely original system is the PerformanZ Ratings.   But I also maintain Elo and Pythagorean ratings that I have modified for football.  Plus there are all of the regression based rankings: Least squares regression, least absolute value regression and logistic regression.  After so many years of doing this I still maintain that a simple least squares model is as good as anything out there.  There is a scoring efficiency algorithm that I learned a long time from a book in my public library written by a professional gambler.  And also a simple points per game comparison which doesn't do that bad.

But in some ways I guess you could say that the 'system' that I am most known for is the composite mean.  Calculating the mean or median of all, or most, of the ratings, especially when you have 70+ of them, is always going to end up performing better than most individual systems.

I don't do any real gambling, any simulated gambling I have experimented with has almost always been using the system average. On my blog page I generally post probabilities (for football) that are based off the system average.  Tracking the top picks of the week over the season almost always gives results in the 55-60% range for at least one of pro or college football.  The NFL picks have been 62% each of the last two years.  A terrible season pops up every few years for some reason I haven't figured out yet. Like this past year the NCAA picks were only 35%.  I experimented with the same idea in college basketball last season and had huge success in the first half of the season before giving it all back once conference play started.

## Friday, February 13, 2015

Monte McNair has kindly offered to host this year's Machine Madness contest on the Ultimate Bracket Challenge website.  If you're thinking about competing, head over there to create an account and familiarize yourself with the site!

Using Ultimate Bracket Challenge will also give us the opportunity to modify the scoring.  The "traditional" scoring approach of 1-2-4-8-16-32 is heavily weighted to the late rounds.  In some ways that can be good -- previous MM contests have gone down to the last game -- but in some ways it's bad -- you can be out of the contest fairly early.  The Kaggle Competition is at the other end of the spectrum.  All games are treated equally, and the competitors predict all possible games, so no one is ever "out of it".  That's interesting, too, but since Kaggle has that covered I don't want to swing too far in that direction.

Ultimate Bracket Challenge also has the option of a market-type approach, where points are divided up between the number of people that correctly picked the winner for that game.  That might be an option for us as well.

I'm also thinking about prizes, so if you know of a company who'd like to offer a prize as part of the contest in exchange for the massive marketing value of being mentioned on this blog :-), please let me know that as well!

## Wednesday, February 4, 2015

### Strength of Schedule & Adjusted Statistics (Part 5)

The method for calculating adjusted statistics I outlined in previous posts turns out to be moderately useful.  It's better for prediction than raw statistics, but not as useful as some other approaches.

One problem is that my approach doesn't explicitly provide a measure for defense against a statistic.  We get an adjusted statistic for (say) 3 pt % that tells us how good a team is relative to other teams for that statistic, but we don't get a measure of how good that team is at defending the 3 pt %.  We can remedy this with a slightly different approach.

In this new approach, we assume that the performance for a team in a particular game is a function of both the team's offensive strength and its opponent's defensive strength.

S_"ij" = (O_i)/(D_j)

S_"ij" represents the value of the statistic in a game between the offensive team (i) and the defensive team (j).  O_i then represents the offensive team's adjusted strength at this statistic, and D_j represents the defensive team's adjusted strength at defending this statistic.

As an example of this approach, assume we have the following schedule of games and performances:

Home3PT%Away3PT%
Gold0.43Silver0.30
Gold0.35Blue0.23
Silver0.28Blue0.26

This yields six equations for the Offensive and Defensive strength ratings:

0.43 = O_g / D_s            0.30 = O_s / D_g
0.35 = O_g / D_b            0.23 = O_b / D_g
0.28 = O_s / D_b             0.26 = O_b / D_s

and solving these equations yields

O_g = 0.50     D_g = 0.75
O_s = 0.40     D_s = 0.85
O_b = 0.30     D_b = 0.70

In general we'll have many more games than teams, and there won't be an exact solution for this system of equations.  Solving the system can be done by an iterative approach similar to the one described for the previous system; assign some values to O and D and alternately recompute O and D until the values converge.

This approach is similar to a ranking method described in [Govan 2009].  (You can get this paper from the Paper Archive.)  Govan describes the conditions under which the ratings will converge (hint: they will for NCAA basketball after a few hundred games) and  a method of calculation.  Govan's approach isn't exactly the same as described here, I leave it to the reader to work out the differences and how to address them.

In the next posting I'll talk about yet another alternative model for adjusted statistics and how that can be calculated.