Split multiple columns into new rows

Question

I have a problem with splitting two columns into new rows.

My dataframe looks like this

"involved" and "team_player_formation" each have 15 strings, that needs to be assigned into each their row. With the first string of "involved" matching first string of "team_player_formation"

I have tried to follow this: Split (explode) pandas dataframe string entry to separate rows and searching for ways to split multiple columns, but without success

I have been able to split one of my columns with the following at the moment

df = pd.read_csv('Hello.csv',  delimiter=';')
df = df.assign(involved=df['involved'].str.split(',')).explode('involved')

Which gives me something like this:

     matchId                                contestantId periodId typeId    involved    team_formation  team_player_formation
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    ..................

    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0

But that only split "involed" into a new column.

The output should be something like this, where I have only showed the first 3 rows.

Thank you! I hope you can help, and that I explained it well enough.

Seconding Scott's request, also can you note which Pandas version you're using? And are the repeated column headers in your desired output necessary, or just there for illustration? Thanks! — Josh Friedlander, Feb 16 '20 at 12:29
Hi guys. Yes, all the repeated columns are needed. I am using python 3.7 with the newest version of Pandas. Just updated it. The pictures you see, is where I just quickly edited it as text in excel for illustration purpose. But this is just one of many examples, where I need to do the same editing. Thank you! — Kasper, Feb 16 '20 at 12:32

score 2 · Answer 1 · answered Feb 16 '20 at 12:39

2

If I undstand your problem correctly, you can use this MCVE to help with your analysis.

df = pd.DataFrame({'ID':[1],'string1':['A,B,C'], 'string2':['X,Y,Z']}, index=[0])

df_joined = df.join([df['string1'].str.split(',', expand=True).add_prefix('s1_'), 
                     df['string2'].str.split(',', expand=True).add_prefix('s2_')])

pd.wide_to_long(df_joined, ['s1','s2'], 'ID', 'No', sep='_').reset_index()

Output:

   ID  No string1 string2 s1 s2
0   1   0   A,B,C   X,Y,Z  A  X
1   1   1   A,B,C   X,Y,Z  B  Y
2   1   2   A,B,C   X,Y,Z  C  Z

answered Feb 16 '20 at 12:39

Scott Boston

147,308
15
139
187

As comment to you Scott Boston. The error I am getting for the code which I posted, is the following: ----> 4 pd.wide_to_long(df_joined, ['s1','s2'], sep='_').reset_index() TypeError: wide_to_long() missing 2 required positional arguments: 'i' and 'j' – Kasper Feb 16 '20 at 21:03
What version of pandas? Upgrade if possible. – Scott Boston Feb 16 '20 at 21:10

Split multiple columns into new rows

1 Answers1