2

Input DF:

Index  Parameters   A      B       C   
1      Apple        1      2       3   
2      Banana       2      4       5   
3      Potato       3      5       2   
4      Tomato       1 x 4  1 x 6   2 x 12

Output DF

Index  Parameters   A      B       C   
1      Apple        1      2       3   
2      Banana       2      4       5   
3      Potato       3      5       2   
4      Tomato_P     1      1       2
5      Tomato_Q     4      6      12

Problem Statement:

I want convert a row of data into multiple rows based on particular column value (Tomato) and with split parameter as x

Code/Findings:

I have a code which works well if I transpose this data set and then apply the answer from here or here and then re-transpose the same.

Looking for a solution which can directly work on the given dataframe

Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51

2 Answers2

1

Solution if always only one x values in data - first Series.str.split by columns in list, then Series.explode, added all another columns by DataFrame.join and set _P with _Q with Series.duplicated and numpy.select:

cols = ['A','B','C']
df[cols] = df[cols].apply(lambda x : x.str.split(' x '))

df1 = pd.concat([df[x].explode() for x in cols],axis=1)
#print (df1)

df = df[df.columns.difference(cols)].join(df1)
df['Parameters'] += np.select([df.index.duplicated(keep='last'), 
                               df.index.duplicated()], 
                               ['_P','_Q'], 
                               default='')

df = df.reset_index(drop=True)
print (df)
  Parameters  A  B   C
0      Apple  1  2   3
1     Banana  2  4   5
2     Potato  3  5   2
3   Tomato_P  1  1   2
4   Tomato_Q  4  6  12

EDIT:

Answer with no explode:

cols = df.columns[1:]

df1 = (pd.concat([df[x].str.split(' x ', expand=True).stack() for x in cols],axis=1, keys=cols)
         .reset_index(level=1, drop=True))
print (df1)
       A  B   C
Index          
1      1  2   3
2      2  4   5
3      3  5   2
4      1  1   2
4      4  6  12

df = df.iloc[:, [0]].join(df1)
df['Parameters'] += np.select([df.index.duplicated(keep='last'), 
                               df.index.duplicated()], 
                               ['_P','_Q'], 
                               default='')

df = df.reset_index(drop=True)
print (df)

  Parameters  A  B   C
0      Apple  1  2   3
1     Banana  2  4   5
2     Potato  3  5   2
3   Tomato_P  1  1   2
4   Tomato_Q  4  6   1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

This is more like a explode problem , available after pandas 0.25

df[['A','B','C']]=df[['A','B','C']].apply(lambda x : x.str.split(' x '))
df
   Index Parameters       A       B        C
0      1      Apple     [1]     [2]      [3]
1      2     Banana     [2]     [4]      [5]
2      3     Potato     [3]     [5]      [2]
3      4     Tomato  [1, 4]  [1, 6]  [2, 12]
df.set_index(['Index','Parameters'],inplace=True)
pd.concat([df[x].explode() for x in ['A','B','C']],axis=1)
                  A  B   C
Index Parameters          
1     Apple       1  2   3
2     Banana      2  4   5
3     Potato      3  5   2
4     Tomato      1  1   2
      Tomato      4  6  12
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thanks for the answer!! Can we not use column names as I have too many column names!! Also, how do i rename column two Tomato..to Tomato_P and Tomato_Q – Rahul Agarwal Oct 14 '19 at 05:26