Split rows into multiple rows based on column value

Question

Input DF:

Index  Parameters   A      B       C   
1      Apple        1      2       3   
2      Banana       2      4       5   
3      Potato       3      5       2   
4      Tomato       1 x 4  1 x 6   2 x 12

Output DF

Index  Parameters   A      B       C   
1      Apple        1      2       3   
2      Banana       2      4       5   
3      Potato       3      5       2   
4      Tomato_P     1      1       2
5      Tomato_Q     4      6      12

Problem Statement:

I want convert a row of data into multiple rows based on particular column value (Tomato) and with split parameter as x

Code/Findings:

I have a code which works well if I transpose this data set and then apply the answer from here or here and then re-transpose the same.

Looking for a solution which can directly work on the given dataframe

jezrael · Accepted Answer · 2019-10-15T09:16:24.007

Solution if always only one x values in data - first Series.str.split by columns in list, then Series.explode, added all another columns by DataFrame.join and set _P with _Q with Series.duplicated and numpy.select:

cols = ['A','B','C']
df[cols] = df[cols].apply(lambda x : x.str.split(' x '))

df1 = pd.concat([df[x].explode() for x in cols],axis=1)
#print (df1)

df = df[df.columns.difference(cols)].join(df1)
df['Parameters'] += np.select([df.index.duplicated(keep='last'), 
                               df.index.duplicated()], 
                               ['_P','_Q'], 
                               default='')

df = df.reset_index(drop=True)
print (df)
  Parameters  A  B   C
0      Apple  1  2   3
1     Banana  2  4   5
2     Potato  3  5   2
3   Tomato_P  1  1   2
4   Tomato_Q  4  6  12

EDIT:

Answer with no explode:

cols = df.columns[1:]

df1 = (pd.concat([df[x].str.split(' x ', expand=True).stack() for x in cols],axis=1, keys=cols)
         .reset_index(level=1, drop=True))
print (df1)
       A  B   C
Index          
1      1  2   3
2      2  4   5
3      3  5   2
4      1  1   2
4      4  6  12

df = df.iloc[:, [0]].join(df1)
df['Parameters'] += np.select([df.index.duplicated(keep='last'), 
                               df.index.duplicated()], 
                               ['_P','_Q'], 
                               default='')

df = df.reset_index(drop=True)
print (df)

  Parameters  A  B   C
0      Apple  1  2   3
1     Banana  2  4   5
2     Potato  3  5   2
3   Tomato_P  1  1   2
4   Tomato_Q  4  6   1

`df1 = pd.concat([df[x].explode() for x in cols],axis=1)` at this line..i am getting error `ValueError: cannot reindex from a duplicate axis` — Rahul Agarwal, Oct 15 '19 at 09:01
@RahulAgarwal - Can you use `df = df.reset_index(drop=True)` before my solution? — jezrael, Oct 15 '19 at 09:01
done same error!! I have made one change to solution...instead of using `cols`, i have done `cols = df.columns[1:].tolist()` — Rahul Agarwal, Oct 15 '19 at 09:04
I ran`df = df.loc[:,~df.columns.duplicated()]` command also to remove if any duplicated columns, but didn't get any and still getting same error!! — Rahul Agarwal, Oct 15 '19 at 09:11

score 0 · Answer 2 · answered Oct 13 '19 at 18:17

This is more like a explode problem , available after pandas 0.25

df[['A','B','C']]=df[['A','B','C']].apply(lambda x : x.str.split(' x '))
df
   Index Parameters       A       B        C
0      1      Apple     [1]     [2]      [3]
1      2     Banana     [2]     [4]      [5]
2      3     Potato     [3]     [5]      [2]
3      4     Tomato  [1, 4]  [1, 6]  [2, 12]
df.set_index(['Index','Parameters'],inplace=True)
pd.concat([df[x].explode() for x in ['A','B','C']],axis=1)
                  A  B   C
Index Parameters          
1     Apple       1  2   3
2     Banana      2  4   5
3     Potato      3  5   2
4     Tomato      1  1   2
      Tomato      4  6  12

Thanks for the answer!! Can we not use column names as I have too many column names!! Also, how do i rename column two Tomato..to Tomato_P and Tomato_Q — Rahul Agarwal, Oct 14 '19 at 05:26

Split rows into multiple rows based on column value

2 Answers2