Our Blog


Mears: Building an Optimal DFS Model for the 2019 U.S. Open

2019 tour championship dfs tips-strategies-ownership

Here at FantasyLabs, we have a Trends tool that allows you to query tons of situations and see how they’ve historically led to DFS value. For example, you can see how a golfer has done at specific events when coming in with awful recent form or having never played the course before.

Using all of that data — specifically the data points in our PGA Models — we can see which metrics have been the most valuable for the U.S. Open and then use that data to optimize a model for this week.

To measure value, we use a propriety metric called Plus/Minus. It’s simple: We know based on history how many DraftKings points a golfer should score based on his salary, and then we can measure performance above or below that expectation.


Quick Notes on Data Uncertainty

There are multiple ways to build a model this week. One is simply to optimize it on past U.S. Open data, but that’s not perfect since the event rotates courses and some of the recent years have had odd factors, such as weather and extra-heavy rough. Some of the recent courses have also been better for bombers, which Pebble Beach may not be.

You could optimize around Pebble Beach historical data, but that’s not perfect either. There’s a bunch of data for the course, but it’s mostly from the Pebble Beach Pro-Am, which is obviously not a major. Further, it’s a unique event in that it rotates between three courses: Pebble, Spyglass and Monterey. There was a U.S. Open at Pebble, but that was all the way back in 2010. Perhaps that data is useful, but I’m always hesitant to rely on data from a decade ago.

Thirdly, you could build a model that optimizes around what a U.S. Open is supposed to set up for: The best all-around golfers. I would lean more on ball-striking, given that Pebble is one of the shorter U.S. Open courses you’ll see, but all-around game will certainly be tested. The rough will be challenging, as it usually is at the U.S. Open, so golfers will need to be precise, but they will also need to scramble and drain putts. And while some of the best drivers are also among the top all-around players in the world — think Rory McIlroy, Brooks Koepka and even Tommy Fleetwood — I would downgrade golfers who gain strokes almost solely with the driver.

Finally, you could blend all of that information together into a singular, cohesive model. I think that’s the best bet, but let’s walk through some of the past U.S. Open data anyway, and you can use that as a starting point.

Building an Optimal Model for the 2019 U.S. Open

Justin Bailey used our Trends tool to backtest previous U.S. Opens and see which metrics entering the tournament were the most predictive of success. He looked at the top 20% of golfers for each metric, and here’s how a corresponding model in Labs would look based on his research (total model points =100).

  • Recent Par-5 Scoring: 19 points
  • Recent Scrambling: 14
  • Long-Term Adjusted Round Score: 8
  • Long-Term Scrambling: 8
  • Recent Putts Per Round: 8
  • Long-Term Tournament Count: 8
  • Recent Adjusted Round Score: 7
  • Long-Term Par-4 Scoring: 6
  • Long-Term Birdies: 5
  • Long-Term Bogeys: 5
  • Long-Term Par-3 Scoring: 5
  • Long-Term Driving Distance: 4
  • Long-Term Eagles: 3

Meanwhile, to highlight upside, I have backtested based on the top 10% of golfers, and here’s how an optimized model would look based on the important factors.

  • Long-Term Field Score: 17
  • Recent Par-5 Scoring: 15
  • Long-Term Putts Per Round: 9
  • Long-Term Eagles: 8
  • Long-Term Driving Distance: 8
  • Recent Driving Distance: 8
  • Recent Birdies: 7
  • Long-Term Scrambling: 6
  • Long-Term Tournament Count: 5
  • Adjusted Round Differential: 5
  • Recent Par-3 Scoring: 4
  • Recent Adjusted Round Score: 4
  • Recent Scrambling: 4

There are some similarities but some notable differences, too, likely because some of the bigger names in the field have struggled at the U.S. Open over the past couple of years (except for Koepka, who is ridiculous): McIlroy, Sergio Garcia, Jon Rahm, Jordan Spieth, Adam Scott, Jason Day and Tiger Woods all badly missed the cut last year. Shinnecock had crazy penalizing rough and wind gusts up over 20 miles per hour. That made things less predictive, and thus our data is going to be less useful moving forward.

For this tournament, Peter Jennings (CSURAM88) has broken down his personal model, which he has kept relatively simple, relying on the big-hitting metrics like Adjusted Round Score and Greens in Regulation. He also emphasizes Vegas data, which hasn’t been predictive in recent years, but (again) that’s probable because the course and weather last year enhanced the overall randomness and some big-name guys missed the cut.

Takeaways

I always want to use all the data available, but this week I would lean toward a simple model like Peter’s. I think the past U.S. Open data might be skewed, and building solely around Pebble Beach data seems flawed since it’s being set up tougher for the major. I would use that data on top of the model as a tiebreaker or to identify golfers who have consistently terrible history at the course.

Photo credit: Eric Bolte-USA TODAY Sports
Pictured: Rory McIlroy

Here at FantasyLabs, we have a Trends tool that allows you to query tons of situations and see how they’ve historically led to DFS value. For example, you can see how a golfer has done at specific events when coming in with awful recent form or having never played the course before.

Using all of that data — specifically the data points in our PGA Models — we can see which metrics have been the most valuable for the U.S. Open and then use that data to optimize a model for this week.

To measure value, we use a propriety metric called Plus/Minus. It’s simple: We know based on history how many DraftKings points a golfer should score based on his salary, and then we can measure performance above or below that expectation.


Quick Notes on Data Uncertainty

There are multiple ways to build a model this week. One is simply to optimize it on past U.S. Open data, but that’s not perfect since the event rotates courses and some of the recent years have had odd factors, such as weather and extra-heavy rough. Some of the recent courses have also been better for bombers, which Pebble Beach may not be.

You could optimize around Pebble Beach historical data, but that’s not perfect either. There’s a bunch of data for the course, but it’s mostly from the Pebble Beach Pro-Am, which is obviously not a major. Further, it’s a unique event in that it rotates between three courses: Pebble, Spyglass and Monterey. There was a U.S. Open at Pebble, but that was all the way back in 2010. Perhaps that data is useful, but I’m always hesitant to rely on data from a decade ago.

Thirdly, you could build a model that optimizes around what a U.S. Open is supposed to set up for: The best all-around golfers. I would lean more on ball-striking, given that Pebble is one of the shorter U.S. Open courses you’ll see, but all-around game will certainly be tested. The rough will be challenging, as it usually is at the U.S. Open, so golfers will need to be precise, but they will also need to scramble and drain putts. And while some of the best drivers are also among the top all-around players in the world — think Rory McIlroy, Brooks Koepka and even Tommy Fleetwood — I would downgrade golfers who gain strokes almost solely with the driver.

Finally, you could blend all of that information together into a singular, cohesive model. I think that’s the best bet, but let’s walk through some of the past U.S. Open data anyway, and you can use that as a starting point.

Building an Optimal Model for the 2019 U.S. Open

Justin Bailey used our Trends tool to backtest previous U.S. Opens and see which metrics entering the tournament were the most predictive of success. He looked at the top 20% of golfers for each metric, and here’s how a corresponding model in Labs would look based on his research (total model points =100).

  • Recent Par-5 Scoring: 19 points
  • Recent Scrambling: 14
  • Long-Term Adjusted Round Score: 8
  • Long-Term Scrambling: 8
  • Recent Putts Per Round: 8
  • Long-Term Tournament Count: 8
  • Recent Adjusted Round Score: 7
  • Long-Term Par-4 Scoring: 6
  • Long-Term Birdies: 5
  • Long-Term Bogeys: 5
  • Long-Term Par-3 Scoring: 5
  • Long-Term Driving Distance: 4
  • Long-Term Eagles: 3

Meanwhile, to highlight upside, I have backtested based on the top 10% of golfers, and here’s how an optimized model would look based on the important factors.

  • Long-Term Field Score: 17
  • Recent Par-5 Scoring: 15
  • Long-Term Putts Per Round: 9
  • Long-Term Eagles: 8
  • Long-Term Driving Distance: 8
  • Recent Driving Distance: 8
  • Recent Birdies: 7
  • Long-Term Scrambling: 6
  • Long-Term Tournament Count: 5
  • Adjusted Round Differential: 5
  • Recent Par-3 Scoring: 4
  • Recent Adjusted Round Score: 4
  • Recent Scrambling: 4

There are some similarities but some notable differences, too, likely because some of the bigger names in the field have struggled at the U.S. Open over the past couple of years (except for Koepka, who is ridiculous): McIlroy, Sergio Garcia, Jon Rahm, Jordan Spieth, Adam Scott, Jason Day and Tiger Woods all badly missed the cut last year. Shinnecock had crazy penalizing rough and wind gusts up over 20 miles per hour. That made things less predictive, and thus our data is going to be less useful moving forward.

For this tournament, Peter Jennings (CSURAM88) has broken down his personal model, which he has kept relatively simple, relying on the big-hitting metrics like Adjusted Round Score and Greens in Regulation. He also emphasizes Vegas data, which hasn’t been predictive in recent years, but (again) that’s probable because the course and weather last year enhanced the overall randomness and some big-name guys missed the cut.

Takeaways

I always want to use all the data available, but this week I would lean toward a simple model like Peter’s. I think the past U.S. Open data might be skewed, and building solely around Pebble Beach data seems flawed since it’s being set up tougher for the major. I would use that data on top of the model as a tiebreaker or to identify golfers who have consistently terrible history at the course.

Photo credit: Eric Bolte-USA TODAY Sports
Pictured: Rory McIlroy