Whiff+ updates, Swing Modeling, and Other Baseball Related notes

Rejoice! Baseball is back! The 2026 MLB season kicked off last Wednesday, and new data is already being plugged into Whiff+ and other models which I will get into later. Anyways, here's a quick update on Whiff+ and some other projects I've been working on.
Whiff+ Expansion
Whiff+ has been expanded to include every major pitch type, using the same methods I went over in the original Heat+ article. All of the non-fastball models are very much in their newest stage, and I haven't taken much time to go in and refine them. That being said, they are correlating well with whiff rates.
Sorry for the blurryness of that image, I'm looking for a new site to host this blog. Anyways, correlations are strong and show that high Whiff+ numbers correlate very well with whiff rates. Splitters have the best correlation, at an R^2 value of .742. Slider is the lowest at .56. All are between .56 and .75.
I have also been working on a command based model. The model looks more at how effectively the pitcher is executing the pitch. For example, a fastball down in the zone is less effective than a fastball up in the zone, even if they have the same exact characteristics. The COMMAND model would reward the pitcher who can consistently locate the fastball at the top of the zone.
For predictive measures, I compared both Whiff+ and COMMAND to Fangraphs Stuff+.
Whiff+ and COMMAND both predict whiff rates far better than Fangraphs. This makes sense, as my models have been trained to predict whiffs where as Fangraphs is trained on run values and outcomes. That's why Fangraphs is considerably better than Whiff+ at predicting xWOBA across all pitch types except fastballs and sweepers.
Seeing that Whiff+ is better at predicting xWOBA than Stuff+ is a very encouraging size, as the fastball model is far and away the one I've spent the most time on.
Swing Modeling
Unlike pitch modeling, swing modeling is a very new idea in the Sabermetrics world. Statcast has been tracking swing data for a few years now, and no one has made a very good swing model yet. I decided to take a crack at it, creating a power based model and a contact based model. There's no public app to search through the database yet, but I am working on one. If there was, looking at some of the top 10s and bottom 10s is very strange and shows how little we (I) know about swing modeling. As we can see, correlations aren't that great. POWER and CONTACT are solid at predicting what they are trained to predict (hard hit and contact respectively), but aren't awesome. Driveline has the best swing model I've seen, and if Eno Sarris uses it it must be solid. But even that one isn't great, as swing characteristics have less impact on pitch outcome than pitch characteristics do. I do plan on playing around with this a little bit, but I believe the stats themselves (Tilt and bat speed most notably) are better predictive and descriptive stats than any swing model out there now.
Other Baseball Notes
Baseball is back! This season, I want to continue to be more of a baseball fan, and not just a Red Sox fan. I want to try and release power rankings every few weeks, but I am busy with my actual job, school, and this internship in the summer. Additionally, I'll try and use these models to highlight some cool players or anything I find interesting. Like I said, I hope to move this blog somewhere better at some point, but I haven't really looked to far into that yet.

Comments
Post a Comment