0

I am working to find the average bat speed of each player in Major League Baseball.
I have two of the three necessary components,

  • pitch velocity, or how hard the pitcher threw the ball,
  • exit velocity, or how hard the ball came off a hitter's bat.
  • The final variable is EA, which is the collision efficiency, meaning how well the ball was placed on the hitter's bat. An EA of .2 indicates perfect contact and a value of -.1 is the weakest possible contact.

This number can be estimated using linear regression. However, the linear regression needs to vary by the hitter. This is because a stronger player and a weaker hitter both hit a ball 87 MPH, but they would not necessarily have the same bat speed. The stronger player hit below their average speed, meaning they did not make perfect contact. The weaker hitter would have hit better than their average speed, meaning they made perfect contact.
A linear model is fitted with two observations per hitter, one with the average exit velocity of the fifteen hardest hits by that player with the EA set to .2 and the average exit velocity of the fifteen weakest hits with the EA set to -.1.
I need a function that will fit a different linear model to each player, and then use that model to estimate the EA for all hits that player had throughout the year.

Note: This information comes from this article (https://community.fangraphs.com/reverse-engineering-swing-mechanics-from-statcast-data/) and further clarification can be found there.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • Regarding the fitting separate models per hitter, read through this link as I think it illustrates what you're after: https://r4ds.had.co.nz/many-models.html Also, definitely have a read here about asking good questions: https://stackoverflow.com/help/how-to-ask – On_an_island Mar 10 '22 at 00:36
  • Thanks for the help! Sorry if the question is confusing, I'm very new to this. – Drew_Haugen Mar 10 '22 at 00:38
  • Not a problem but definitely read through the second link about asking a good question because it will go a long way towards getting the answers you're after. But I believe the first link I provided you is a good example of how to code up the workflow you're after. – On_an_island Mar 10 '22 at 00:40
  • 1
    See also: `lme4::lmList` (or using *mixed models*, i.e. `lme4::lmer` - this refinement isn't important if you have a lot of data for every hitter) – Ben Bolker Mar 10 '22 at 00:57
  • The question I marked as a duplicate is a nice example of another version of this question - it provides some sample data and shows what's been tried so far. I think the answers there should get you what you need, but if not please ask another question with more detail! (And do take special note of Ben's comment - if you have some hitters with only a few data points using a mixed model would be an excellent modification). – Gregor Thomas Mar 10 '22 at 02:30

0 Answers0