0

This is my dataframe, column "Reference" can have more than one word (separated by space blank " ")

Col1    Col2      Col3    Reference

Apple   Broccoli  Frog    M1139 M1944 
Orange  Tomato    Rope    M1134
Potato  Tuna      Lemon   M1129 M1944 M2944 

I would like to create a duplicate of the entire row if there is more than one word in column "Reference", writing each word in each line.

For example in row 1 there are 2 words in "Reference" M1139 M1944,therefore two lines are created:

Desired output:

Col1    Col2      Col3    Reference

Apple   Broccoli  Frog    M1139
Apple   Broccoli  Frog    M1944
Orange  Tomato    Rope    M1134
Potato  Tuna      Lemon   M1129
Potato  Tuna      Lemon   M1944
Potato  Tuna      Lemon   M2944 

1 Answers1

0

Use str.split and explode:

df2 = (df
       .assign(Reference=df['Reference'].str.split(' '))
       .explode('Reference')
       )

Output:

     Col1      Col2   Col3 Reference
0   Apple  Broccoli   Frog     M1139
0   Apple  Broccoli   Frog     M1944
1  Orange    Tomato   Rope     M1134
2  Potato      Tuna  Lemon     M1129
2  Potato      Tuna  Lemon     M1944
2  Potato      Tuna  Lemon     M2944

By default, explode duplicates the values of the row indices as well. If you want to reset the index, you can pass ignore_index=True to explode:

df3 = (df
       .assign(Reference=df['Reference'].str.split(' '))
       .explode('Reference', ignore_index=True)
       )

Output:

     Col1      Col2   Col3 Reference
0   Apple  Broccoli   Frog     M1139
1   Apple  Broccoli   Frog     M1944
2  Orange    Tomato   Rope     M1134
3  Potato      Tuna  Lemon     M1129
4  Potato      Tuna  Lemon     M1944
5  Potato      Tuna  Lemon     M2944
Angus L'Herrou
  • 429
  • 3
  • 11
mozway
  • 194,879
  • 13
  • 39
  • 75