0

I need some help in order to represent my data frame in another way with pandas in python

Here is the data frame:

Order   Nsp1    Nsp2    Tips    Select
Carnivora   10  10  SpA,SpB,SpC 0.7678
Carnivora   1   1   SpD NA
Carnivora   2   2   SpE,SpF 0.467103
Carnivora   4   4   SpG 0.303415

As you can se the column Tips can have several Spnames and I would like to get a new data frame such as :

Species Order   Nsp1    Nsp2    Select
SpA Carnivora   10  10  0.7678
SpB Carnivora   10  10  0.7678
SpC Carnivora   10  10  0.7678
SpD Carnivora   1   1   NA
SpE Carnivora   2   2   0.467103
SpF Carnivora   2   2   0.467103
SpG Carnivora   4   4   0.303415

Where The Tips column disappears by instead I create a new column Species with them.

Does anyone have an idea ?

Thank you for your help and time.

chippycentra
  • 879
  • 1
  • 6
  • 15

1 Answers1

1

First you have to split your Tips column to rows with the function found in this answer:

df = explode_str(df, 'Tips', sep=',')

print(df)
       Order  Nsp1  Nsp2 Tips    Select
0  Carnivora    10    10  SpA  0.767800
0  Carnivora    10    10  SpB  0.767800
0  Carnivora    10    10  SpC  0.767800
1  Carnivora     1     1  SpD       NaN
2  Carnivora     2     2  SpE  0.467103
2  Carnivora     2     2  SpF  0.467103
3  Carnivora     4     4  SpG  0.303415

Then you can concat the strings from the Tips + Order column together to create your new column:

df['Species Order'] = df['Tips'] + ' ' + df['Order']
df.drop(['Order', 'Tips'], axis=1, inplace=True)
df.reset_index(inplace=True, drop=True)

print(df)
   Nsp1  Nsp2    Select  Species Order
0    10    10  0.767800  SpA Carnivora
1    10    10  0.767800  SpB Carnivora
2    10    10  0.767800  SpC Carnivora
3     1     1       NaN  SpD Carnivora
4     2     2  0.467103  SpE Carnivora
5     2     2  0.467103  SpF Carnivora
6     4     4  0.303415  SpG Carnivora

Function used from the linked answer:

def explode_str(df, col, sep):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
    return df.iloc[i].assign(**{col: sep.join(s).split(sep)})
Erfan
  • 40,971
  • 8
  • 66
  • 78