0

I have a data frame with a row of data like this:

play_by_play = pd.DataFrame([{
    "players": "Tom Brady; Mike Evans; Tristan Wirfs; Leonard Fournette; Chris Godwin", 
               "down": 1, 
               "to_go": 10,
                "play_type": 'pass',
               "yards_gained": 8,
               "pass_attempt": 1,
               "complete_pass": 1,
               "rush_attempt": 0
              }])

I want to keep track of stats while the given players are on the field by using group by and aggregate functions. If I were looking to do this player-by-player, I would use play_by_play["players"].str.contains("Tom Brady") and aggregating the data using that filter, but I am looking to automate this. The solution I've landed on is to duplicate rows and have the "players" value split for each row. It would look something like this:

player down to_go play_type yards_gained pass_attempt complete_pass rush_attempt
"Tom Brady" 1 10 pass 8 1 1 0
"Mike Evans" 1 10 pass 8 1 1 0
"Tristan Wirfs" 1 10 pass 8 1 1 0
"Leonard Fournette" 1 10 pass 8 1 1 0
"Chris Godwin" 1 10 pass 8 1 1 0

How could I accomplish this? As I mentioned before, this needs to be pretty scalable, as there are thousands of rows of data. If there's an easier way to group by based on unique values contained in a semicolon-separated column, I'm more than happy to go that route.

Vivek Lele
  • 109
  • 4
jwald3
  • 19
  • 1
  • 6
  • Have you seen [Split (explode) pandas dataframe string entry to separate rows](https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows)? (Don't just look at the accepted answer.) – Steven Rumbalski Oct 03 '22 at 02:19

1 Answers1

1

You can use pandas.Series.str.split to make a list of the players then pandas.DataFrame.explode to make multiple rows :

play_by_play['players'] = play_by_play['players'].str.split(';')
play_by_play = play_by_play.explode('players').reset_index(drop=True)

# Output :

print(play_by_play)

              players  down  to_go play_type  yards_gained  pass_attempt  complete_pass  rush_attempt
0           Tom Brady     1     10      pass             8             1              1             0
1          Mike Evans     1     10      pass             8             1              1             0
2       Tristan Wirfs     1     10      pass             8             1              1             0
3   Leonard Fournette     1     10      pass             8             1              1             0
4        Chris Godwin     1     10      pass             8             1              1             0
Timeless
  • 22,580
  • 4
  • 12
  • 30