0

I have a problem with splitting two columns into new rows.

My dataframe looks like this

enter image description here

"involved" and "team_player_formation" each have 15 strings, that needs to be assigned into each their row. With the first string of "involved" matching first string of "team_player_formation"

I have tried to follow this: Split (explode) pandas dataframe string entry to separate rows and searching for ways to split multiple columns, but without success

I have been able to split one of my columns with the following at the moment

df = pd.read_csv('Hello.csv',  delimiter=';')
df = df.assign(involved=df['involved'].str.split(',')).explode('involved')

Which gives me something like this:

     matchId                                contestantId periodId typeId    involved    team_formation  team_player_formation
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  b2492j7qzdo7g3ysxz6gq4g5x   4   1
    ..................

    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0
    0   d5321qxmnyf9004i049uf4pre   77tfx9me4aaqhzv78bmgsy9bg   2   40  am3509ake84cde1xhb9264i22   4   0

But that only split "involed" into a new column.

The output should be something like this, where I have only showed the first 3 rows.

enter image description here

Thank you! I hope you can help, and that I explained it well enough.

Kasper
  • 121
  • 6
  • Can you paste your input data as text? – Scott Boston Feb 16 '20 at 12:28
  • Seconding Scott's request, also can you note which Pandas version you're using? And are the repeated column headers in your desired output necessary, or just there for illustration? Thanks! – Josh Friedlander Feb 16 '20 at 12:29
  • Hi guys. Yes, all the repeated columns are needed. I am using python 3.7 with the newest version of Pandas. Just updated it. The pictures you see, is where I just quickly edited it as text in excel for illustration purpose. But this is just one of many examples, where I need to do the same editing. Thank you! – Kasper Feb 16 '20 at 12:32

1 Answers1

2

If I undstand your problem correctly, you can use this MCVE to help with your analysis.

df = pd.DataFrame({'ID':[1],'string1':['A,B,C'], 'string2':['X,Y,Z']}, index=[0])

df_joined = df.join([df['string1'].str.split(',', expand=True).add_prefix('s1_'), 
                     df['string2'].str.split(',', expand=True).add_prefix('s2_')])

pd.wide_to_long(df_joined, ['s1','s2'], 'ID', 'No', sep='_').reset_index()

Output:

   ID  No string1 string2 s1 s2
0   1   0   A,B,C   X,Y,Z  A  X
1   1   1   A,B,C   X,Y,Z  B  Y
2   1   2   A,B,C   X,Y,Z  C  Z
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • As comment to you Scott Boston. The error I am getting for the code which I posted, is the following: ----> 4 pd.wide_to_long(df_joined, ['s1','s2'], sep='_').reset_index() TypeError: wide_to_long() missing 2 required positional arguments: 'i' and 'j' – Kasper Feb 16 '20 at 21:03
  • What version of pandas? Upgrade if possible. – Scott Boston Feb 16 '20 at 21:10