I am working to find the average bat speed of each player in Major League Baseball.
I have two of the three necessary components,
- pitch velocity, or how hard the pitcher threw the ball,
- exit velocity, or how hard the ball came off a hitter's bat.
- The final variable is EA, which is the collision efficiency, meaning how well the ball was placed on the hitter's bat. An EA of .2 indicates perfect contact and a value of -.1 is the weakest possible contact.
This number can be estimated using linear regression. However, the linear regression needs to vary by the hitter. This is because a stronger player and a weaker hitter both hit a ball 87 MPH, but they would not necessarily have the same bat speed. The stronger player hit below their average speed, meaning they did not make perfect contact. The weaker hitter would have hit better than their average speed, meaning they made perfect contact.
A linear model is fitted with two observations per hitter, one with the average exit velocity of the fifteen hardest hits by that player with the EA set to .2 and the average exit velocity of the fifteen weakest hits with the EA set to -.1.
I need a function that will fit a different linear model to each player, and then use that model to estimate the EA for all hits that player had throughout the year.
Note: This information comes from this article (https://community.fangraphs.com/reverse-engineering-swing-mechanics-from-statcast-data/) and further clarification can be found there.