Dataframe Duplicate a row under a certain condition inside a cell

Question

This is my dataframe, column "Reference" can have more than one word (separated by space blank " ")

Col1    Col2      Col3    Reference

Apple   Broccoli  Frog    M1139 M1944 
Orange  Tomato    Rope    M1134
Potato  Tuna      Lemon   M1129 M1944 M2944

I would like to create a duplicate of the entire row if there is more than one word in column "Reference", writing each word in each line.

For example in row 1 there are 2 words in "Reference" M1139 M1944,therefore two lines are created:

Desired output:

Col1    Col2      Col3    Reference

Apple   Broccoli  Frog    M1139
Apple   Broccoli  Frog    M1944
Orange  Tomato    Rope    M1134
Potato  Tuna      Lemon   M1129
Potato  Tuna      Lemon   M1944
Potato  Tuna      Lemon   M2944

score 0 · Answer 1 · edited Mar 05 '22 at 21:55

Use str.split and explode:

df2 = (df
       .assign(Reference=df['Reference'].str.split(' '))
       .explode('Reference')
       )

Output:

     Col1      Col2   Col3 Reference
0   Apple  Broccoli   Frog     M1139
0   Apple  Broccoli   Frog     M1944
1  Orange    Tomato   Rope     M1134
2  Potato      Tuna  Lemon     M1129
2  Potato      Tuna  Lemon     M1944
2  Potato      Tuna  Lemon     M2944

By default, explode duplicates the values of the row indices as well. If you want to reset the index, you can pass ignore_index=True to explode:

df3 = (df
       .assign(Reference=df['Reference'].str.split(' '))
       .explode('Reference', ignore_index=True)
       )

Output:

     Col1      Col2   Col3 Reference
0   Apple  Broccoli   Frog     M1139
1   Apple  Broccoli   Frog     M1944
2  Orange    Tomato   Rope     M1134
3  Potato      Tuna  Lemon     M1129
4  Potato      Tuna  Lemon     M1944
5  Potato      Tuna  Lemon     M2944

Dataframe Duplicate a row under a certain condition inside a cell

1 Answers1