Split Column of Semicolon-Separated Values and Duplicate Row with each Value in Pandas

Question

I have a data frame with a row of data like this:

play_by_play = pd.DataFrame([{
    "players": "Tom Brady; Mike Evans; Tristan Wirfs; Leonard Fournette; Chris Godwin", 
               "down": 1, 
               "to_go": 10,
                "play_type": 'pass',
               "yards_gained": 8,
               "pass_attempt": 1,
               "complete_pass": 1,
               "rush_attempt": 0
              }])

I want to keep track of stats while the given players are on the field by using group by and aggregate functions. If I were looking to do this player-by-player, I would use play_by_play["players"].str.contains("Tom Brady") and aggregating the data using that filter, but I am looking to automate this. The solution I've landed on is to duplicate rows and have the "players" value split for each row. It would look something like this:

player	down	to_go	play_type	yards_gained	pass_attempt	complete_pass
"Tom Brady"	1	10	pass	8	1	1
"Mike Evans"	1	10	pass	8	1	1
"Tristan Wirfs"	1	10	pass	8	1	1
"Leonard Fournette"	1	10	pass	8	1	1
"Chris Godwin"	1	10	pass	8	1	1

How could I accomplish this? As I mentioned before, this needs to be pretty scalable, as there are thousands of rows of data. If there's an easier way to group by based on unique values contained in a semicolon-separated column, I'm more than happy to go that route.

Have you seen [Split (explode) pandas dataframe string entry to separate rows](https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows)? (Don't just look at the accepted answer.) — Steven Rumbalski, Oct 03 '22 at 02:19

score 1 · Accepted Answer · answered Oct 03 '22 at 02:22

You can use pandas.Series.str.split to make a list of the players then pandas.DataFrame.explode to make multiple rows :

play_by_play['players'] = play_by_play['players'].str.split(';')
play_by_play = play_by_play.explode('players').reset_index(drop=True)

# Output :

print(play_by_play)

              players  down  to_go play_type  yards_gained  pass_attempt  complete_pass  rush_attempt
0           Tom Brady     1     10      pass             8             1              1             0
1          Mike Evans     1     10      pass             8             1              1             0
2       Tristan Wirfs     1     10      pass             8             1              1             0
3   Leonard Fournette     1     10      pass             8             1              1             0
4        Chris Godwin     1     10      pass             8             1              1             0

Split Column of Semicolon-Separated Values and Duplicate Row with each Value in Pandas

1 Answers1

# Output :