Introducing Heat+, a Brand New Fastball Quality Model

A few weeks ago, Camden Quick, who I will be interning with on the cape this summer, developed his own Stuff+ model to use in college ball. Taking after him, I’ve been working on my own pitch model. This is the first iteration of the model, and I plan on creating a better viewing application to go alongside with. With that being said, here is the preliminary version of Heat+.
Heat+ is a pitch model designed only for fastballs at the moment. Heat+ was created in R, using XGBoost’s gradient boosting techniques to capture more. I took fastball data from 2023, 2024, and 2025, giving my model 3 different seasons to work with. The model is trained to predict the run value of a certain pitch. To define the run value, I used statcasts built in run value metric, “delta_run_exp”. This is the change in run expectancy from before the pitch to after the pitch. For a more detailed explanation on how pitch models such as this are made, read this.
In this first version, I put an emphasis on extension and arm angle, trying to capture more deception than Stuff+ captures. The metrics I included were release speed, induced vertical break, release position, spin rate, spin axis, arm angle, and horizontal break. I also created a few stats, most notably ext_x_ivb and arm_x_ext. This trains XGBoost to look at the combination of extension and ivb as greater than the sum of their parts. I put a priority on extension as it is a pitching characteristic that is hardest to teach, as it comes primarily from pitcher height. Higher extension allows for pitchers fastballs to play up, as they are releasing the ball closer to home plate. Arm angle is also important, as it is a primary way for pitchers to add deception. Feature's are likely what I am going to be changing most about the model, as I experiment with what correlates with performance more.
Two relievers make up the top two in Heat+, as Marlins reliever Ronny Henriquez and Giants reliever Randy Rodriguez come in at the top two spots. Below that, my model loves what Jacob Degrom was able to do in his age 37 season, and he's followed by two young starters in Jacob Misiorowski and Eury Perez. Looking at the bottom 10, we can see where Heat+ and Stuff+ models in general are limited. Ranger Suarez, despite his bottom 10 Heat+ ranking, 3.20 ERA in 26 starts last year. Because Heat+ only looks at fastballs, the value of a pitchers secondaries are not included. This hurts pitchers like Suarez, who's secondaries are what make them a great pitcher. Heat+ also does not look at other pitching factors such as command and location.
Heat+ is a pitch model designed only for fastballs at the moment. Heat+ was created in R, using XGBoost’s gradient boosting techniques to capture more. I took fastball data from 2023, 2024, and 2025, giving my model 3 different seasons to work with. The model is trained to predict the run value of a certain pitch. To define the run value, I used statcasts built in run value metric, “delta_run_exp”. This is the change in run expectancy from before the pitch to after the pitch. For a more detailed explanation on how pitch models such as this are made, read this.
In this first version, I put an emphasis on extension and arm angle, trying to capture more deception than Stuff+ captures. The metrics I included were release speed, induced vertical break, release position, spin rate, spin axis, arm angle, and horizontal break. I also created a few stats, most notably ext_x_ivb and arm_x_ext. This trains XGBoost to look at the combination of extension and ivb as greater than the sum of their parts. I put a priority on extension as it is a pitching characteristic that is hardest to teach, as it comes primarily from pitcher height. Higher extension allows for pitchers fastballs to play up, as they are releasing the ball closer to home plate. Arm angle is also important, as it is a primary way for pitchers to add deception. Feature's are likely what I am going to be changing most about the model, as I experiment with what correlates with performance more.
For predictive stats, it's important for the model to predict itself year over year. I took inspiration from Adam Salorio's aStuff+ writeup, and included a "stickiness" graph. The goal of this graph is to see how much variance in year 2 Heat+ can be explained by year 1 Heat+.
Heat+ proves very sticky from year to year, with an R-squared value above .70 for both minimum pitch metrics. This makes sense, as Heat+ is based more on physical pitcher characteristics that don't change from year to year. It's also important to look at where Heat+ stabilizes. To do this, I used split-half reliability, which splits a pitchers pitches into two random halves, calculates the Heat+ for both, and looks at how well they correlate at different sample sizes. The stabilization point is where the correlation hits .70.
As can be seen, Heat+ stabilizes incredibly early. This makes sense, as the first version of this model is fairly basic.
My model is much worse than Fangraphs at describing and predicting xwOBA, but is almost equal when it comes to analyzing whiff rates. A .35 correlation for predicting next year's whiff rate is meaningful, and indicates higher Heat+ leads to more whiffs, while a negative correlation with xwOBA means a higher Heat+ leads to a lower xWOBA. There is a sizable gap between Stuff+ and Heat+ when looking at xwOBA, but Fangraphs has had years to build and perfect this model. For contrast, I built my model in a couple of weeks.
For my first version of this model, I was pleasantly surprised at the descriptive and predictive capabilities. Models such as this are meant to be great predictive tools, and that is ultimately the goal. By having an influence on extension, I want this model to be a good predictor for minor league and college level pitchers. These are pitchers who have yet to truly refine their arsenal, but feature physical qualities that cannot be taught. I evaluated the correlation between my model and Fangraphs Stuff+ to whiff% and xwOBA.
Heat+ does best when look at whiff rates, as it is focused on a pitchers pure stuff.
As expected, Heat+ has a positive correlation with whiff rates. As Heat+ goes up, more whiffs are generated. An R-Squared value of .22 indicates 22% of the variance in whiff rates can be explained by Heat+. This number may seem low, but keep in mind Heat+ only looks at the pure quality of a pitchers fastball. Factors such as location, count, sequencing, and batter quality are not included. My target R-Squared value for the final model is .30-.35.
With all that out of the way, we can turn to look at leaderboards.
Two relievers make up the top two in Heat+, as Marlins reliever Ronny Henriquez and Giants reliever Randy Rodriguez come in at the top two spots. Below that, my model loves what Jacob Degrom was able to do in his age 37 season, and he's followed by two young starters in Jacob Misiorowski and Eury Perez. Looking at the bottom 10, we can see where Heat+ and Stuff+ models in general are limited. Ranger Suarez, despite his bottom 10 Heat+ ranking, 3.20 ERA in 26 starts last year. Because Heat+ only looks at fastballs, the value of a pitchers secondaries are not included. This hurts pitchers like Suarez, who's secondaries are what make them a great pitcher. Heat+ also does not look at other pitching factors such as command and location.
All that being said, I am very proud of the first version of this model. I plan to continuously tweak and refine the model over the next few months, in between college courses and work. I hope to post many updates as I work on this model.






Comments
Post a Comment