0

I need to expand a single row of my Pandas dataframe intwo multiple rows based on splitting the score column (containing match results) based on the spaces.

This is what the data looks like

A   B   score
1   2   6-1 6-2
3   4   6-4 4-6 6-3

To achieve the goal I used the approach from here.

With slightly adapting the approach, my dataframe looks like this:

A   B   score           sets
1   2   6-1 6-2         6-1
1   2   6-1 6-2         6-2
3   4   6-4 4-6 6-3     6-4
3   4   6-4 4-6 6-3     4-6
3   4   6-4 4-6 6-3     6-3

However, I would also like to have another additional column which represents the number of the set per match. It is like a cumulative count of the sets per match. My question is, how can the above linked solution be changed in order to get the desired result which looks as follows:

A   B   score           sets    setnumber
1   2   6-1 6-2         6-1     1
1   2   6-1 6-2         6-2     2
3   4   6-4 4-6 6-3     6-4     1
3   4   6-4 4-6 6-3     4-6     2
3   4   6-4 4-6 6-3     6-3     3

I think somewhere in the following code lines an adaption needs to be done, but I couldn't figure out, yet, how it should work:

s = df['score'].str.split(' ').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
beta
  • 5,324
  • 15
  • 57
  • 99

1 Answers1

3

You can use repeat and then cumcount

In [2915]: dff = df.set_index(['A', 'B'])['score'].repeat(
                            df['score'].str.split(' ').str.len()
                                 ).reset_index()

In [2916]: dff
Out[2916]:
   A  B        score
0  1  2      6-1 6-2
1  1  2      6-1 6-2
2  3  4  6-4 4-6 6-3
3  3  4  6-4 4-6 6-3
4  3  4  6-4 4-6 6-3

In [2917]: dff.assign(setnumber=dff.groupby(['A', 'B']).cumcount()+1)
Out[2917]:
   A  B        score  setnumber
0  1  2      6-1 6-2          1
1  1  2      6-1 6-2          2
2  3  4  6-4 4-6 6-3          1
3  3  4  6-4 4-6 6-3          2
4  3  4  6-4 4-6 6-3          3

You could also get dff with .loc

In [2923]: df.loc[df.index.repeat(df['score'].str.split(' ').str.len())]
Out[2923]:
   A  B        score
0  1  2      6-1 6-2
0  1  2      6-1 6-2
1  3  4  6-4 4-6 6-3
1  3  4  6-4 4-6 6-3
1  3  4  6-4 4-6 6-3
Zero
  • 74,117
  • 18
  • 147
  • 154
  • I just got to this solution. is there any difference? `df['setnumber'] = df.groupby(['A', 'B']).cumcount() + 1` – beta Sep 23 '17 at 09:49
  • Not much, `assign` returns a copy, your method sets it to `df`. – Zero Sep 23 '17 at 09:49