Wikipedia Explains Bayes
Per the undisputed champ of the world, Wikipedia, a “prior probability distribution (Bayesian prior) of an uncertain quantity is the probability distribution that would express one’s beliefs about this quantity before some evidence is taken into account.” In layman’s terms, in order to understand the probability of something — say, a seven-game NBA series [Editor’s Note: That’s cold.] — you must first understand the initial odds.
This is pretty intuitive when you think about it. For example, if the Philadelphia 76ers — the worst team in basketball last year — played the Golden State Warriors in a seven-game series, they would be massive underdogs. But let’s say that they shocked the world and somehow upset the Warriors in Game 1. Per WhoWins.com, NBA teams with a 1-0 lead in a seven-game series have gone on to win the series 77 percent of the time, not taking into account home-court advantage or anything.
Now it wouldn’t make a whole lot of sense for the Philadelphia 76ers to become 77 percent favorites — that would equate to about -335, for reference — solely because of that first game. Vegas certainly wouldn’t ever make that mistake, as it understands that the Philadelphia 76ers had incredibly low odds to win the series in the first place. Winning a game increases their probability of winning the series — say from one percent to three percent, or something like that — but it doesn’t change it from 50 percent to 77 percent. Congratulations, you now understand Bayesian theory.
Digging Up the Past
One thing I appreciate about FL co-founder Jonathan Bales is his willingness to admit a mistake; he’s been on the record in about a million places about how he handled Odell Beckham’s rookie year. He admits to fading him continually and how he was hampered by his biases of Beckham. But I believe Bales isn’t giving himself enough credit here. He was simply relying on an important part of data science: Priors.
(To be fair: I had a similar issue with Devonta Freeman, but we’ll get to him in due time.)
From Leonard Mlodinow’s book The Drunkard’s Walk: How Randomness Rules Our Lives:
Bayes developed conditional probability in an attempt to answer the [question]: how can we infer underlying probability from observation? If a drug just cured 45 out of 60 patients in a clinical trial, what does that tell you about the chances the drug will work on the next patient? If it worked for 600,000 out of 1 million patients, the odds are obviously good that its chances of working are close to 60 percent. But what can you conclude from a smaller trial? Bayes also asked another question: if, before the trial, you had reason to believe that the drug was only 50 percent effective, how much weight should the new data carry in your future assessments?
Bayes and Mlodinow ask: “But what can you conclude from a smaller trial?”
Welcome to predicting the game of football.
Odell Beckham’s First Two Years
Getting back to the priors issue: What did we know about Beckham before his NFL rookie year? Well, we knew that in his three-year college career at LSU he averaged 3.6 receptions per game, 16.4 receiving yards per game, and scored 12 total touchdowns. Using RotoViz’s Box Score Scout and putting in Beckham’s 2013 junior year at LSU — his best year by far — we get these similar prospects using his per-game averages:
Not all of those guys are disasters, obviously: Demaryius Thomas is on the list, and Donte Moncrief and Breshad Perriman have promising-looking careers. Even when we add Beckham’s draft position to the similarity calculator, we get a stud in Julio Jones, but it still doesn’t inspire much confidence that we’re looking at a once-in-a-generation prospect.
If we used as Beckham’s prior his college career — which is obviously a big part of what DFS players do to predict NFL success for rookies — there was little reason to believe he’d be a top WR in his rookie year, let alone put up one of the greatest WR stretches in NFL history.
Devonta Freeman’s 2015 Campaign
And then there’s Devonta.
Here are the first two years of his career:
• 2014: 16 games, 248 total rushing yards, 3.8 yards per carry, 5.52 DK points per game, +0.51 average Plus/Minus
• 2015: 15 games, 1,056 total rushing yards, 4.0 yards per carry, 22.06 DK points per game, +6.48 Plus/Minus
After that 2014 season the Atlanta Falcons drafted Indiana running back Tevin Coleman in the third round and it looked like he was going to supplant what had been a disappointing Freeman thus far.
I don’t have season comparisons for Freeman in 2014, but I can’t imagine the comparable list for a player who averaged 15.5 rushing yards per game on a sub-average 3.8 YPC rate would be stellar.
Freeman: Posterior
I believe Beckham and Freeman are both informative (yet different) cautionary tales about priors in football. This isn’t to say that Bayes’ ideas on probability don’t work for football; they most certainly do. A Bayesian model is one that 1) starts with a prior probability but also 2) adapts to new information. Both are important and Beckham and Freeman are great examples of how we can make mistakes in the Bayesian process.
For Freeman, all the people who didn’t believe in him last year weren’t wrong because they improperly analyzed the type of player he is. In fact, his final YPC last season of 4.0 was pretty similar to the mark from his disappointing rookie season. Rather, the mistake made with Freeman was on which prior is important. I believed that efficiency was an important prior that impacted Freeman’s probability of being a fantasy-relevant running back. But that’s not what matters, because we know that for RBs what matters — almost entirely what matters, in fact — is opportunity.
The analysis of Freeman-the-Player wasn’t incorrect; the analysis of Freeman-the-Situation was.
OBJ: Posterior
Beckham represents the other cautionary tale about probabilities and football — that is, dealing with small samples and being slow to adapt to new information. And this is a trickier issue: How do you take data from a small sample and use it for future prediction?
Unfortunately, there isn’t a perfect answer to this question. Football players just have incredibly small samples; having a ‘significantly significant’ sample is typically defined as having a 95 percent confidence threshold. You can see why you’d need even a large sample of coin flips for a sample size to be statistically significant, let alone a complicated, entangled sport like football.
This is why probability is better depicted in normal distribution graphs. Here’s an example that I’m completely making up off the top of my head. Let’s say I want to find how far I can consistently shoot an arrow. So I get a bow, shoot 200 arrows, and record my data points. In this hypothetical scenario, here’s how those arrows were distributed by a range of feet:
After 200 shots, we would say that I can consistently shoot an arrow between 30 to 40 feet. We could do this with coins and measure how many heads and tails we have after 200 flips. We could do it in our Trends tool with Devonta Freeman’s 2015 and 2016 games:
You see the problem. If a simple test such as flipping the coin — a test that has zero other variables — requires such a large sample size to deem it statistically significant, what about football? On the scale of football players and coin flips, a running back’s entire career is about three flips into a 100-flip sample. You could get three heads, three tails, or some other combination.
I’m not trying to discourage anyone from attempting to predict football by any means. And since this is a DFS article and we’re a DFS site, let me end with this: Knowing about the probability issues is the entire point; it’s how to find the edge. It’s what will allow you to find the priors that matter and adapt quicker to more meaningful data. In a sport that isn’t predictable, use its unpredictably to your advantage.
And in the case of Beckham and Freeman, roll with Beckham and fade Freeman. I think? Who knows.