ProFootballLogic | ||||||||||
ARTICLES | TEAMS | STATS | RATINGS | STANDINGS | GAMES | SCHEDULE | PLAYERS | METHOD | SPORTS |
By Michael Gertz
Wednesday, November 26, 2014
Computer rankings for college football have earned a negative reputation from the old BCS computer composite ranking which took the average of several different and poorly formed computer ranking systems that disagreed with each other. But there is no reason the College Football Playoff selection process couldn't be done more effectively by computer ranking than by the new selection committee if the parameters of how we want to rank teams is agreed upon prior.
The primary problem with college football rankings in general is the fundamental lack of understanding about the difference between which teams are the best and which teams have accomplished the most impressive won-lost records. For this reason, we have formed a Summary of two different rating systems we made to analyze each of these questions separately to show how they can be used to select teams, or at least analyze how well the selection committee has done its job.
Many people, including the College Football Playoff selection committee, gloss over the fact that there even is a difference between the best teams and the most accomplished. But there is actually a very significant difference between the two. Won-lost records and measures of strength of schedule are enough to measure which teams have accomplished the most, but they actually aren't the best way to judge which teams are the best. We know this because betting market odds regularly beat even the most accurate estimations based solely on records and schedule.
Our predictive ratings use point differential, rather than game outcomes, to judge teams. Point differential has been proven to be more accurate than records at predicting games in just about every sport because it is a more precise measure of how effective each team was in a game. Other more advanced metrics such as those our NFL ratings use can improve on accuracy even beyond point differentials, but are rare at the college level because of the difficulty of producing them due to the vast number of college teams.
It initially feels right to pick the best teams for the playoffs, but in reality it makes more sense to judge teams based mostly or even entirely on their accomplishment rather than how good they actually are. Imagine how silly it would seem if after the NFL regular season, we seeded teams by how good they are rather than by their records. The outcomes of regular season games would become largely meaningless. Not to mention, such a system would muddy the incentives for teams. Winning teams would have an incentive to run up the score in blowouts, and losing teams would even be inclined to avoid their usual high-risk strategies that maximize their odds of winning but often lead to worse point differentials. The selection committee, whether on purpose or by accident, does judge teams mostly on their accomplishment rather than by how good they are.
The method behind our Predictive Ratings is to simply answer the question "What set of ratings assigns the highest likelihood for the actual game results that occurred?". Using past data to determine parameters for the value of home-field advantage and the standard deviation for how much random spread each game has, we first estimate a normal distribution of probabilities for game outcomes. This is then used to calculate the odds of each team winning each game by a certain number of points based on the difference between their ratings.
Finally, an optimization is done to solve for every team's rating all together by maximizing the probability that every game's result occured. The probability of all game results occuring is the product of every game's own odds, as the odds that two events both occur is found by multiplying together the odds that each of them occur. Because all team ratings are optimized together, the process automatically accounts for game locations and strength of opponents. The results not only rank teams, but tell exactly how much better any team is than another because the difference in ratings acts as a point spread projection for any hypothetical matchup.
Our W-L Resume Ratings are a measure of how impressive a team's W-L record is, regardless of how many points they've won or lost each game by. Many others try to rank team resumes by overcomplicated hodgepodge formulas that often include large rounding errors or lack hindsight. But our method is again simple: to make predictive ratings that use only game outcomes rather than point differentials. So rather than calculating the odds of a team winning by a number of points, we only calculate the odds of a team winning or losing.
Because these ratings are not bound by point differentials, we must add an important adjustment to prevent all undefeated teams from having infinite ratings. Whereas our predictive ratings don't make any real initial assumptions about teams, our resume ratings assume that all teams initially come from a normally distributed sample of teams with a standard deviation of ratings based on past data. Therefore the odds of all game outcomes occurring are not only multiplied by each other, but also by the odds that each team would have such a rating.
Such a method may intuitively sound arbitrary, but it isn't. It is exactly the assumption that would need to be used in an equivalent scenario where we knew each team had a specific rating that we could check afterwards, for instance if a season was simulated with actual ratings by a computer and we had to estimate what the ratings were without any knowledge of them.
The College Football Playoff selection committee ranks teams relatively similarly to our W-L resume ratings. But it does so in a rather inaccurate and inconsistent manner. Because it is impossible for people to account for everything involved with rating teams that computer models can keep track of all at once, the committee ends up making not only numerous rounding errors like "how many wins against top 25 teams do they have", but also many wildly inaccurate estimations on topics such as whether it is easier to beat 6 mediocre teams, or 2 good ones and 4 bad ones.
You wouldn't drive a car or even trust a weather forecast that was made without the aid of a computer model, so why do we let college football be ruled by gut feelings and rough estimations? Like those other fields, ranking college teams involves way too many working variables to expect individuals to be as good at it as a well designed computer model. In science, people make "back of the envelope calculations" when they just have a couple minutes to make a rough guess about something that doesn't really matter. Only in college sports do people accept such back of the envelope methods over a real analysis as the best we can do.
Sometimes the committee rankings differ from our resume ratings slightly in the direction towards our predictive ratings, which is an acceptable form of them valuing how good teams actually are a bit. But much more often, the committee simply ranks one team over another even with both worse resume and predictive ratings, indicating that they just lack the ability to accurately judge different teams' schedules against one another.
Contrary to popular assumptions, there is no way to make a perfectly universal strength of schedule metric that really makes sense to compare different teams in a W-L resume context. The reason for this is that the difficulty of schedules is relative to how good a team is itself. Most strength of schedule metrics treat playing the best and worst teams in D-IA as equal to playing 2 average teams. But playing the first set of teams would make teams a lot more likely to be 1-1 than 2-0 or 0-2. A good team would be better off playing 2 mediocre teams, while a bad team would be better off facing the best and worst and hoping to just beat the worst.
What's easier, going 10-2 in the best conference playing 4 great teams, 4 good ones, and 4 cupcakes, or going 12-0 in bad conference playing 3 good teams, 3 mediocre ones, and 6 cupcakes? Questions like this are nearly impossible to answer by guessing based on hunches and history, but can be solved precisely by our ratings.
The best strength of schedule metrics that can be made have to set a standard, like "for an average top 25 team, how many wins would be expected?", which still are only partially useful. Our rating method takes schedule into account on an individual level to find the most likely ratings for each team, and from them we can get a general sense of how tough a schedule teams have faced. The order of teams with even records essentially ranks their strength of schedule, while teams with similar ratings but different records shows which W-L record was more difficult to accomplish for a team of their caliber.
Recent Articles |
---|
If 2021 Had 16 Games   -   1/10/22 |
Wk 18 Playoff Scenarios 2021   -   1/8/22 |
Wk 17 Playoff Scenarios 2020   -   1/1/21 |
Wk 17 Playoff Scenarios 2019   -   12/27/19 |
2 Week Playoff Scenarios 2019   -   12/21/19 |
3 Week Playoff Tiebreakers 2019   -   12/11/19 |
NFL Injury Point Value   -   6/18/19 |
How Teams Value Draft Picks   -   4/25/19 |
Analyzing The Zion Injury   -   3/21/19 |
Week 17 Playoff Scenarios 2018   -   12/27/18 |
BUF | MIA | NE | NYJ | BAL | CIN | CLE | PIT | HOU | IND | JAC | TEN | DEN | KC | LV | LAC | |||||||||||||
DAL | NYG | PHI | WAS | CHI | DET | GB | MIN | ATL | CAR | NO | TB | ARI | LAR | SF | SEA | |||||||||||||
ProFootballLogic.com welcomes questions, requests, and error reports by email to contact@profootballlogic.com | ||||||||||||||||||||||||||||
Privacy Policy | ||||||||||||||||||||||||||||
Copyright © 2024 ProFootballLogic.com. All Rights Reserved. |