I have a data frame with a row of data like this:
play_by_play = pd.DataFrame([{
"players": "Tom Brady; Mike Evans; Tristan Wirfs; Leonard Fournette; Chris Godwin",
"down": 1,
"to_go": 10,
"play_type": 'pass',
"yards_gained": 8,
"pass_attempt": 1,
"complete_pass": 1,
"rush_attempt": 0
}])
I want to keep track of stats while the given players are on the field by using group by and aggregate functions. If I were looking to do this player-by-player, I would use play_by_play["players"].str.contains("Tom Brady")
and aggregating the data using that filter, but I am looking to automate this. The solution I've landed on is to duplicate rows and have the "players" value split for each row. It would look something like this:
player | down | to_go | play_type | yards_gained | pass_attempt | complete_pass | rush_attempt |
---|---|---|---|---|---|---|---|
"Tom Brady" | 1 | 10 | pass | 8 | 1 | 1 | 0 |
"Mike Evans" | 1 | 10 | pass | 8 | 1 | 1 | 0 |
"Tristan Wirfs" | 1 | 10 | pass | 8 | 1 | 1 | 0 |
"Leonard Fournette" | 1 | 10 | pass | 8 | 1 | 1 | 0 |
"Chris Godwin" | 1 | 10 | pass | 8 | 1 | 1 | 0 |
How could I accomplish this? As I mentioned before, this needs to be pretty scalable, as there are thousands of rows of data. If there's an easier way to group by based on unique values contained in a semicolon-separated column, I'm more than happy to go that route.