ADVERTISEMENT

Gilliam Ratings are Back!

GilliamRatings

VaPreps All State
Jun 5, 2001
7,609
317
83
61
Orange, Virginia
It took much longer than I expected.

Believe it or not, they have been gone for 7 years.

Tomorrow I will post my first "Ratings Beat" column in over 7 years,

Hopefully, I will find enough interesting to write about and add to the page to keep people interested in VHSL football until August 2015.

For a quick look: here's 2014's top 10 teams in Division 4:

106.58 Lake Taylor
97.18 Salem
91.95 Liberty-Bealeton
86.67 Jefferson Forest
86.12 Monacan
85.45 George Washington
81.84 Sherando
81.68 Woodgrove
81.61 Dinwiddie
80.60 Courtland

The web site is up
 
Could you explain your ratings. Would you consider a higher rated team as favorite in a head to head match up with a lower rated team?
 
I'm for real every time. It just has taken me a lot of work to get all the errors out of my code (and uncover lots of mistakes in data on the internet).

I'll be writing an article that explains the ratings soon but there's actually one on the site right now that's pretty detailed. Here's the key paragraph:

The initial system was pretty primitive, [/B]but it had the seed of the idea that has been the what the Gilliam Ratings[/I] have always been based on. If you play better than expected then there's up to three explanations. 1) You are actually a better team than I had you rated, 2) The other team is actually worse than I had them rated, 3) The game was a fluke. So the simple idea was if team A plays team B I would compare the ratings to get an expected margin of victory. If team A did better than expected, it's rating would be adjusted upward and the team that played worse would have it's rating adjusted downward. The formula I used was this:

Rating adjustment = (Actual Difference in score - Expected difference in score)/4

So it worked this way:

Podunk High's pre-game rating: 50, City Slicker Senior's pre-game rating: 60.

Podunk high was supposed to lose by 10, so their "expected difference in score" was -10
Podunk wins the game by 21-10 so the "Actual Difference in score" was 11.

Rating adjustment = (11 - -10)/4 which comes out to 21/4 = 5.25

You add that amount to Podunk's rating which is now 55.25 and subtract it from City Slicker's rating which is now 54.75.

I played around with that "4" in the rating. That number meant that if you were the underdog in the game you would have to beat them by more than they were favored to win by in order to pass them. I played with it a lot and almost every tweak I've made to the formula has dealt with that number in some form or another.


So yes, the short answer is that if two teams play, the difference in their ratings is the "point spread." The higher rated team should win by the difference in the ratings. However, there are a couple of caveats on that:

caveat 1) If a team typically wins low scoring games, their "point spread" when favored is probably a little smaller than what the ratings give. I give a bonus to teams that win low scoring games because if a game is 7-0 points are much more at a premium in that game and in effect a 7-0 game is statistically a better win than a 49-42 win. The amount of bonus I give is not a huge amount, but it makes a difference.

Here's a few scores and what the computer would actually consider the "effective margin of victory."
7-0 10.01
14-7 9.03
21-14 8.05
28-21 7.07
29-22 7.00 (or above).

So a casual observer may see a prediction of team A winning by 8 points and if they win 14-7 may wonder why the rating went up for the team the next week, the fact is that the computer treated that like a 9.03 point win and therefore thought they played better than expected.

Caveat number two: If a team is favored by 80 points I really don't expect them to win by 80 points. I don't think many coaches are going to try and run up 80 points on their opponents (I'm not saying it doesn't happen, and I'm not saying they're necessarily "running it up" if it does, but a game like this, I expect coaches to get their subs in, plus there's going to be a running clock). Right now you can win by considerably less than this margin of victory and your rating will not drop. 40 points had always been the magic number in the past, and that's what I use right now, but I may switch that to 35 since that's where the VHSL running clock rule kicks in.

I am planning to make a lot of improvements, in the upcoming months, but I am trying to take my time and put as little false information out there as possible. The big thing that held me up the past year was I kept thinking I had everything right and then I'd analyze the results and find out there was a bug that was kicking in for certain cases or that I was missing data, or had scores entered twice that were counting double, etc. I'm not promising that the data is perfect, now, but I can't find any glaring errors.

1 other thing. No forfeits are used. If a team wins 80-0 and later forfeits, the 80-0 score will still be used to calculate the ratings and the game will still be counted as a win. If I can find a comprehensive list of forfeits somewhere I will make notes of the actual records in the ratings, but that is a problem for future Matt to deal with. Right now I'm happy to have up what I have.

I will always try to be as honest as I can with people when I discuss the ratings. I don't hide my formulas and I always try to give results of how the computer did picking games (not just on getting winners right, but on how close the prediction was). If someone makes a suggestion that I think might be doable and will make the ratings more accurate, I will test it out and see if it works.

For me the ultimate test for how good the ratings are is this..."How far off is the average prediction for the real score?" Whatever systems produces the lowest number as an answer to that question is the one I want to use.
 
This has to be a ton of work. Many, many thanks for your efforts.

Huh. A 4A North team at #2, and seven Northern teams in the top ten? That can't possibly be correct?!! ;-)
This post was edited on 1/2 5:17 AM by SpartanOfYore
 
I had already noticed your observation and thought about it. I don't think that there's any real statistical reason that I've proven yet for there to be more 4A south teams in the top ten, but it "feels" like we should have more. Here's what's coming that may settle this:

I've got an improvement on the system coming out in about a month and it might adjust things around, but Lake Taylor fans may not like it. I haven't run it yet, but it's possible that the outlier game wasn't Lake Taylor vs. Monacan (which is what the classic ratings seems to indicate). It's possible if we iterate this formula enough times that it'll turn out the outlier game was Lake Taylor vs. Bird, if that's the case, then the whole 4A south gets raised up, but Bird could pass Lake Taylor in the ratings. I suspect the truth is somewhere in the middle.

Bird's not really 5 TD's better than Monacan and Lake Taylor's probably a little more than 1 score better than Monacan and Bird and Lake Taylor are probably very close.

Like I said, I haven't run my "iterative" ratings yet, but when I do I think you'll still see Lake Taylor ahead of Bird but they'll be a lot closer. You'll also see a few of the 4A South teams move up somewhat, I really don't think anyone in 4A made much of a case for themselves after Lake Taylor regardless of which region their school was in.

No one played the Titans tough except for Monacan and Bird, both from the Dominion District which was not a highly regarded district going into the season.

What I intend to do to improve the ratings is to actually "iterate" the formula. That is, run the ratings and take the final results and re-enter those as the new starting ratings. You do that hundreds or thousands of times until the ratings settle somewhere (for me settling is after the ratings don't change for any team by more than .005).

This will make the ratings more accurate (in theory) after the season is over, but it takes a long time for the program to run 1000's of iterations so you'll have to wait a while on this (I also have five different ways I want to try). It's not a useful method during the season until the schedule is "everywhere connected." That is, we need to have a path of games that connects every team in the ratings to every other team in the ratings. We don't get that until the first round of the playoffs due to the Beach District not having any out of district games.

So right now, the ratings are calculated using the previous season's final rating as each season's starting rating, because that has proven to be more accurate than starting everyone the same. However, when I unveil the iterative method I anticipate it working this way:

I'll start with 1999 and actually start every team with a rating of 50. I'll iterate that to get the starting ratings for 1999.

Then I'll test out the 2000 season several ways (once again starting everyone at 50 or giving teams different starting ratings based on their division or by using the final ratings from 1999). I'll test it out to see which one gives me the best results (best results being the method that in retrospect would have produced them smallest difference in predicted scores and actual scores).

I'll do this all the way up until the 2014 season and use the "Iterative Ratings" as the new official final ratings for each season.

Then, when 2015 starts up, I'll likely just use the classic method until we get the schedule everywhere connected and we'll compare the two methods to see how they do predicting the rest of the playoffs and I'll see how to proceed from there. I anticipate, but am not sure, that I'll want to use my classic ratings (without iterations) until the schedules have been connected everywhere for a couple of weeks and then switch over. Until something changes in the way the VHSL schedules however, that won't be until the round of 8. I may actually use the iterative ratings before that, however to predict the scores of games. That switchover might occur around week 4 or 5.

I've just got to take my time and make sure when I change something, that the change is mathematically and experimentally sound, and not just whimsical or that it "seems" like a good idea. It's important to me that these be as accurate as possible.
 
Thanks, Matt. Just to be clear, my observation was meant facetiously. As a Salem guy, I've more than had my fill of reading how the worst team in the South is better than the best team in the North, or stuff to that effect.
 
Originally posted by SpartanOfYore:
Thanks, Matt. Just to be clear, my observation was meant facetiously. As a Salem guy, I've more than had my fill of reading how the worst team in the South is better than the best team in the North, or stuff to that effect.
Where did you read that? Salem was good, but IMHO not better than the top 4-5, but the worse team now you are reaching. Salem was a good team, but you are one that don't want to accept the fact they weren't all that a lot of people pumped them up to be. Solid program, but after running through and I mean literally running through everyone you played this year it didn't happen for one game.
 
Originally posted by GilliamRatings:

I've got an improvement on the system coming out in about a month and it might adjust things around, but Lake Taylor fans may not like it. I haven't run it yet, but it's possible that the outlier game wasn't Lake Taylor vs. Monacan (which is what the classic ratings seems to indicate). It's possible if we iterate this formula enough times that it'll turn out the outlier game was Lake Taylor vs. Bird, if that's the case, then the whole 4A south gets raised up, but Bird could pass Lake Taylor in the ratings. I suspect the truth is somewhere in the middle.
Huge shock right here. Will Bird shoot up to #1 in the state overall with system improvement 2.0? It's a really cool thing you do and it takes time and all, but you can make numbers say whatever you want when you create the system. How is Manchester basically 14 points worse than Bird when they beat Bird? Because Manchester lost to Hermitage and Bird beat Hermitage? So head to head means nothing? Powhatan is rated the highest first round loser ahead of two teams that actually won 1st round games and it's heavy on the 4a North side. Guess Tidewater wasn't as good as we thought.


This post was edited on 1/2 1:46 PM by DEVILSLB99
 
Okay, I re-ran this with my "improved" system. So you understand what the difference is...here you go....
The original ratings always used the previous year's final ratings as a starting point for the new ratings and the initial assumption in 1999 was that D6 was 5 points better than D5 which was 5 points better than D4 which was 5 points better than D3.

The five point thing was entirely arbitrary and when I redid the ratings this time I went with 7 points. Here's the thing. There's no need to go with anything. I'm a better programmer now.

With the new ratings I start every team at 50. No matter if they're 1A and haven't won a game this century or if they're the biggest school in the state and on a 300 game winning streak. EVERYONE starts at 50.

What I do instead is I run the program and get results. The results are not very good. They're better than everyone is 50, but they're not good at all....so, I take the results I get for the end of the season and plug them back in as the new starting values and rerun it again. Hey, the results are a little better. I do it again. They're a little better. Well, this could take forever...I will automate this. I did this 1,000,000 times. So that the results were not changing at all by the time I was finished. So the ratings have nothing to do with any assumptions about anything. I haven't published them, yet, but I did run 2014 just to get this cleared up quickly, and since some people don't trust me.

So, no judgements about which area is better than which. No judgements about how much better 6A is than 5A. Just the scores of games and they eventually settle on this: Here's my top ten and I'm sticking to it (no, Bird didn't pass Lake Taylor, but they did nudge a little closer, just like I anticipated...I know what numbers are going to do when I do calculations with them...hence the ability to catch my mistakes and score 800's on SATs and GREs).

113.51 Lake Taylor
97.25 Salem
93.78 Monacan
91.98 Liberty-Bealeton
87.97 Jefferson Forest
86.75 Dinwiddie
85.74 Woodgrove
85.58 Heritage-Newport News
84.90 Kings Fork
84.79 George Washington

I like them. You can go back to your crying now.
 
You can only make them say whatever you want if you don't understand them or if the people you are speaking to don't understand them. Statistics is not guess work; the theorems are more solid that the science that keeps airplanes in the air. Don't buy it when someone tells you that (worst quote of all time: "Lies, damn lies, and statistics"). When someone tells you that they're either saying they're ignorant of math or implying that you are. For instance, there is no way for me to make numbers say that the circumference of circle divided by the diameter of the circle in Euclidean Geometry is 9.23. If I try to make them say that, then either I don't know what I'm doing, or I think you don't know what I'm doing.


This post was edited on 1/2 10:12 PM by GilliamRatings

This post was edited on 1/2 10:15 PM by GilliamRatings
 
Of course head to head means something.

Bird beat Hermitage
Hermitage beat Manchester
Manchester beat Bird

They were all 1-1 against one another, so you have to look at the other 3000+ games played this year to help yourself out.

There are literally 100's of chains like this every season, some are much longer than this.

Football scores and winning football games do not obey the transitive property. We have to resort to probability and statistics because Boolean Algebra and standard algebra won't do in this case.

That's the whole point of this: head-to-head matters, and not just the games you want to matter, but all of the head to head games matter. Bird didn't just play Hermitage and Manchester, they played 13 other teams. All 15 of the teams they played had between 9 and 14 other games and all of those team's opponents had between 9 and 14 other games and every one of those was a head to head match up. We are trying to turn 2300 scores into a more manageable list of one number per team. It is impossible to do that and have EVERY head to head matchup represented exactly. We have to average them out.

Bird did lose to Manchester by 6 in regular season
Manchester dis lose to Hermitage by 12 in round of 16
Hermitage did lose to Bird by 21 in round of 8

You can't reconcile those scores easily, but if they were the only data I'd have I'd end up saying Bird outscored their opponents by 15, Hermitage was outscored by 9 and Manchester was outscored by 6. If we divide by 2 and make the average rating a 50, Bird's rating would be 57.5, Manchester 47, and Hermitage 45.5. This of course would not give us the correct outcome to any of the games, but it "averages" the result and would lead us to conclude Bird is, on an average night, 10.5 points "better" than Manchester. Now, we also have some other data that might make us want to adjust this more. Bird won the deepest game in the playoffs and the most recent game between the three...some people would tell you that winning more recent games or playoff games is worth more than regular season games. Bird went on to win the state championship, and one might give a little bonus for that achievement (I don't, but you could). Also Bird and Manchester played many common opponents. In almost every case Bird beat the common opponent quite a bit worse than Manchester did. Since the rating are not used solely to compare Bird to Manchester, but to compare all teams to one another and since Bird typically would beat the same opponent Manchester beats by 14 more points, it might be reasonable to assume when we put all these factors together that on an average night, Bird is 14 points better than Manchester, even if Bird clearly was not better on the one night they happened to play.

Maybe one more way to address the Bird vs. Manchester thing (once again the ratings are concerned with comparing EVERY team to one another not just Bird vs. Manchester) is to look at the common opponents:

Manchester was 6 points better than Bird head to head
Manchester was 8 points better than Bird against Midlothina
Bird was 7 points better than Manchester against Wythe
Bird was 7 points better than Manchester against Clover Hill
Bird was 7 points better than Manchester against Monacan
Bird was 9 points better than Manchester against James River
Bird was 31 points better than Manchester against Huguenot
Bird was 33 points better than Manchester against Hermitage
Bird was 34 points better than Mancheser against Cosby

So against playoff teams, Bird was much better. This does not change the fact that Manchester beat Bird, but when we're setting up ratings that compare everybody, that Cosby result is just as important as any other game. So in these 9 games Bird was 114 points better than Manchester (114/9=12.67). So the ratings would make sense. Most teams were on average 12.67 points worse against Bird than Manchester. If you throw in the fact that Manchester's remaining opponents were 16-16 and Bird's were 69-11 that might account for Bird getting a boost of another point.

I don't get mad at my students when I teach something poorly, so I will try to be patient, but I get paid to teach them. So I'm going to try.

Here's the basics (this is two months worth of statistics class in a paragraph).

Some things in the world have certain (or nearly certain outcomes). If I drop a cannonball off the roof of my house, we can calculate very accurately how long it take for it to hit the ground, how fast it will be going, and how much force it will hit the ground with. Algebra, Geometry, and Calculus and their related fields deal with this kind of stuff very nicely, but there are many other things for which the outcome just can't be predicted that easily.

If I give you an aspirin there is a chance your headache will go away within an hour. However, it's not certain. It's also not certain that if it went away it was because of the aspirin, or if it was going to go away anyway. I bought a lottery ticket just before I started typing this. I don't think I'll win any money because I usually don't, but there's a chance. We can calculate that chance using the rules of probability.

I can't tell you who will win the NFL playoff games this weekend. Too many variables involved. So many, in fact, that if we look at enough game, the results are random. When I say they are random, that is not to say that all results are equally likely. We can use the rules of probability to determine how likely a team is to win a game, or how likely a player is to get a hit or make a free throw. My later examples are pretty simple. We just look at past track record and if we have enough data (say 30 free throws or more, then the past rate of success is a good indicator of the probability they'll make the shot). If a player, of a long period, starts to dramatically do better or worse than in the past, that is a tip off that we might want to look at what has changed (guy responds differently under playoff pressure, got a new girlfriend, broke his hand).

Now, determining who might win a football game in the future is much trickier. We have a lot of data, but not data specific to what we want to know. For instance. If Salem were to play Monacan, a lot of people have speculated on what would happen. We all pretend we are experts on football and evaluating talent and calling plays and then tell everybody what we think like we're geniuses. The truth is, some people are experts on all that stuff and they simply still make the wrong prediction all the time. Also, the truth is, none of us have seen every team in the state play every game so we're working with a limited set of data anyway.

The simple truth is that I don't think the 2014 Spartans and 2014 Chiefs played one another. They had one common opponent. An opponent who claims they were off when they played the Chiefs and a little more on when they played the Spartans. I don't think the two schools have ever played, though I suppose it's possible that they hooked up somewhere in the last 40 years.

So honestly, how do we try and make this call? More importantly, how do we try to make this call if someone brings up any two of the 308 VHSL teams?

Quick, 2009 James Wood vs. 2004 Indian River, who'd win? Contrary to what idiots on ESPN try to make you think when they're picking the NFL games, we just don't know. However, there are techniques that can help us.

First of all, let's try to make it as simple as possible. We try to find the one variable that correlates the most to winning football games (correlation is a whole chapter...sigh). What is it? Coaching experience? Running back's 40 time? Size of school? NO, it's points scored and allowed. So the most successful rating system is going to be primarily (or possibly even exclusively) based on points scored and allowed. This is nice because, that's pretty much the one statistic that you probably are getting reported to you accurately (though not always, believe me). None of us have yardage numbers, time of possession numbers on all the games and if we did, I can guarantee you that they'd be wrong, anyway.

So, if we're really going to try and predict the winners of games our variable is going to be the final scores of previous games. We can determine experimentally the average margin of victory, home field advantage rates, etc. Using means and standard deviations and the laws of probabilities and z-scores and other things we can come up with an idea of how two teams would play against one another based on the numbers. Is the result certain? No. Upsets happen (statistically, about 5/6 to 6/7 of VHSL games are upsets, it seems).

So we can represent that probability various ways, but my favorite is simply that: just a probability that team A will beat team B. Now, exactly what this number represents is questionable.

Let's just say I told you that Salem has a 58% probability of beating Monacan (obvious, due to the fact that the probability of A and not A must add up to 100% that means Monacan has a 42% chance of beating Salem), am I saying that I'm 58% sure Monacan has a better team? Am I saying that there's a 42% chance the ratings have it wrong, or am I saying there's a 42% chance of an upset. Truth is I don't know, and I'm not sure I know why it matters anyway. Just suffice it to say that with the standard deviations we've observed in games in the past that we know a team rated as much higher than another team as Salem is over Monacan would win the game 58% of the time.

The other way to represent this probability is a point spread. We might say Salem is a 3 point favorite, or that Salem would win by three. Many programs (including my own) would use this number to predict a score. Say, Salem 27, Monacan 24. The truth is we really don't expect the team to actually win the game by three. We kind of think of it this way. If they played a huge number of games then on AVERAGE Salem would win by three. Late in the season the standard deviation of ratings systems always seems to be right around 14 or 15 points. I usually just throw out the number 14 because it's 2 touchdowns, but 14.5 is more accurate.

What that means is if a team is favored by 3 points and the standard deviation is 14, they have a z-score of 3/14 and we can actually calculate the probability they will win the game from that number (z-score tables if you ever took college stats). It also means that a little more than 2/3 of the time they play the final score will be within 14 points of that, and 95% of the time it will be within 28 points, and only with great rarity will it be more than 48 points different from the prediction.

So using this stuff, we can generate a number that compares teams to other teams. Matchups matter, but more than one number just wouldn't be very useful to the average human brain, so when we rate the teams we are kind of saying that on average this number represents how well the teams have played against their opponents over the season. Any team you beat by more than the ratings suggest mean you probably played better than your average against that team.

All these rules can be found in any advanced text on statistics and thanks to some great modern math theorems we can trust that they can be used to rate sports teams and we can always test our results at the end of the year to see if they are behaving as expected. If not, we know we've got a mistake somewhere or a flawed assumption somewhere and we need to debug it or tweak it (I love that part).

So am I saying that the Gilliam Ratings are right. Nope. I am saying they can't possibly be right, but no system can be, and if all you're going to go on is the final scores of games, well, this system is as strong as any and much better than what I used to publish in the past. Part of why I was so slow to get them back up was every time I'd get started I'd think of a way to improve them, hence, here I am 48 hours after posting the new ratings and I already have newer and better ratings.

I love it when people find mistakes or think about things to consider in the formulas that I may have not considered. If you suggest that statistics is not good math, however. I will dismiss your arguments, you have nothing to add to the business of mathematically rating teams and you should write poems about them instead (poetry is another fine endeavor of the human mind, I am not putting it down, just suggesting that it is another way of thinking about things).

No wonder my students hate my class.
This post was edited on 1/2 10:01 PM by GilliamRatings

This post was edited on 1/2 10:45 PM by GilliamRatings
 
In case you're wondering about 5A ratings under my "improved" method, here's the top 10 (this will be posted before the month is over, I just want to be careful that I'm not making any mistakes before I post them).

106.98 L.C. Bird
104.27 Tuscarora
99.39 Highland Springs
94.70 Manchester
93.23 Hermitage
90.79 Salem-Virginia Beach
88.69 Atlee
87.25 Broad Run
86.08 Massaponax
85.68 Briar Woods
84.11 Meadowbrook
82.27 Norview
80.84 Stone Bridge
79.00 Indian River
78.27 Great Bridge

Man, 5A was loaded!
 
Also, if anyone thinks I am tilting the ratings to favor anyone, I can assure you I am not.

Everyone knows that I pull for L.C. Bird. They have never been #1 in the ratings except for 1 week after they after beat Monacan 73-7 in 2013. Their rating went up the next week after beating Lake Taylor, but Centreville passed them and stayed ahead of them the rest of the season. They eventually finished third in the ratings that season behind the Wildcats and Dinwiddie. I could have easily slipped them in at #1 an not published my formula and no one would have ever known the difference. They would have just won their second straight state title and 29th straight game over a team that had won three straight title-who would have said, "Oh Gilliam was just cheating for Bird?" The formula I use is published on the site and the reasoning for every part of it is explained. You'll also see that once the season gets going I predict the result of EVERY game between VHSL teams and let you know how I did the next week. No secrets; no pretending. You'll see, the ratings do what I say they will, they predict the correct winner 80-85% of the time once the season is rolling along. You could do similarly as well if you wanted to sit down and think about every team in the state and read about them an analyze the scores, but you have a life. Let my computer help.

Once again, I don't mind you questioning my football assumptions or math mathematical assumptions, but please don't question the science of statistics or my integrity. That'll get you ignored by me in any future discussions.
 
Originally posted by GilliamRatings:

I like them. You can go back to your crying now.
Crying about what? Somebody questions your system and now they are crying? I poked a little fun at you about Bird since everyone says you have rooting interest in Bird and you got mad and said I'm questioning your integrity. Your system is your system you didn't have to create and you decided to create and it creates some talk and is interesting to read. But at the end of the day its just a program you wrote what a computer calculates and spits out right? So while its cool it bears no impact on who is playing for state titles and who wins games. I was questioning how did Bird become 14 points better than Manchester you explained it so case closed. As far as ignoring me goes, you can ignore me really doesn't bother me never met you and won't lose any sleep over it.
This post was edited on 1/3 8:22 AM by DEVILSLB99
 
ADVERTISEMENT

Latest posts

ADVERTISEMENT