Feng: Top three things to know about college football analytics
What can numbers tell us about college football? Are they useful in making predictions on games?
In some ways, college football is the last frontier in sports analytics. While baseball has made it to Hollywood with "Moneyball," college football listens to nebulous statements on strength of schedule from the playoff committee.
College football also presents challenges due to a lack of data. The 12-game regular season seems like spare change compared to the 82 games for the NBA or 162 for MLB. Moreover, football offers almost no numbers on important players such as offensive linemen.
However, numbers are still very useful in predicting college football. Let's look at three important concepts.
1. Last year matters
College football teams might seem to start fresh each year. Players leave while new recruits arrive. A bad season might lead to a new coordinator or even a head coach. It might seem like last year doesn't matter.
However, many things stay the same for programs from year to year. Blue blood program like Michigan have a rich tradition and the financial resources that come from packing over 100,000 fans for every home game in the Big House.
In contrast, Sun Belt school Louisiana Monroe has the smallest athletic department budget of any bowl subdivision school. According to a report, coach Todd Berry has a recruiting budget of $36,761 for 2014-15.
The FBS bowl subdivision includes teams with vastly different resources. Due to these differences, a team's performance tends to persist from year to year.
This visual shows team strength from 2013 on the x-axis and 2014 on the y-axis.
he numbers come from my team rankings at The Power Rank, which take margin of victory and adjust for strength of schedule. The visual shows each team's rating, or a predicted margin of victory against an average FBS team.
The visual shows a strong persistence from season to season. You can make a good prediction for a program in 2014 by looking at how it fared in 2013. This is not possible in the NFL.
Due to this persistence, I use a team's performance over a four-year window to make preseason rankings.
This simple model, which also considers turnovers and returning starters, has predicted over 70 percent of game winners since 2005.
2. Predicting turnovers
Turnovers can have a big impact on the outcome of football games. A linebacker makes a big hit, which knocks the ball out of the running back's hands, or a quarterback throws an errant pass into the hands of the secondary. Any insight into turnovers would help predict the outcome of games.
However, randomness plays a huge role in turnovers. For example, turnover margin in the first six games of the season has almost no ability to predict turnover margin the remainder of the season.
You have to dig deeper to find any way of predicting turnovers. Let's start with interceptions.
My research has shown a correlation between completion percentage and interception rate (interceptions divided by pass attempts). A higher completion percentage implies a lower interception rate.
In college football, the relationship exists in team statistics. However, the correlation is even stronger when looking at the career numbers of NFL quarterbacks.
Correlation doesn't imply causation, but it's reasonable to think that less accurate quarterbacks tend to throw more interceptions. Sometimes an errant quarterback throws the ball at his receiver's ankles, other times it goes to the other team.
Michigan State's Connor Cook is an exception to this trend. Over his career, he has completed a below-average 58 percent of his passes while throwing fewer picks than average, with a 1.9-percent interception rate.
From watching his games, Cook seems like an elite quarterback with good to great accuracy. His low completion percentage most likely has another explanation, such as a tendency to throw down the field more than others.
Fumbles are even more difficult to predict than interceptions. I've been been unable to find a correlation of fumble rate with any basic box score statistic.
However, NFL data suggests quarterbacks do have some control in not fumbling the ball.
Over the last five season, the teams that fumble the least have quarterbacks named Brady, Brees and Ryan. This requires more research to make any firm conclusions.
3. Efficiency, efficiency, efficiency
The more plays a team runs, the more yards it gains. Also, the more plays a defense faces, the more yards it will allow.
Pace matters in college football. Baylor runs as many plays as possible, while Michigan and Michigan State take their time getting to the line of scrimmage.
However, no one has told the talking heads on television as they continue to rank offense and defense based on misleading yards per game statistics.
Instead, it's better to look at an efficiency metric like yards per play. Take the total yards and divide by the number of plays. It's simple yet powerful.
Yards per play gets tricky when you break it down into passing and rushing. For some unknown reason, sacks count as negative rush plays in college football. Since sacks began as a pass attempt, they should count as pass plays.
All college football sites except two make this improper classification of sacks. To get the correct numbers, check out my rankings of yards per play for passing and rushing.
Bill Connelly also makes this proper adjustment for his S&P numbers at Football Outsiders.
While yards per play is simple and powerful, there are many other useful efficiency metrics for football, such as expected points added. To learn more about these metrics, check out Part 3 of my analytics guide.
Second installment later this season
These three concepts should provide a firm foundation early this college football season. In a later column, we'll look at other important concepts in college football analytics that make more sense later in the season.
Here are my predictions for this weekend:
* Michigan will beat UNLV by 27.5, which implies a 96-percent win probability.
* Michigan State will beat Air Force by 19.5, which implies a 93-percent win probability.
Ed Feng has a Ph.D. in chemical engineering from Stanford and runs the sports analytics site The Power Rank. Email Ed Feng here.