15

I stuck with the problem how to divide a pandas dataframe by row,

I have a similar dataframe with a column where values are separated by \r\n and they are in one cell,

    Color                              Shape  Price
0  Green  Rectangle\r\nTriangle\r\nOctangle     10
1   Blue              Rectangle\r\nTriangle     15 

I need to divide this cell into several cells with the same values as other columns, e.g.

   Color      Shape  Price
0  Green  Rectangle     10
1  Green   Triangle     10
2  Green   Octangle     10
3   Blue  Rectangle     15
4   Blue    Tringle     15

How do I do it well?

jtlz2
  • 7,700
  • 9
  • 64
  • 114
George
  • 151
  • 1
  • 1
  • 4
  • Try `df.Shape.str.split(expand=True).stack()`. Does that help? – John Zwinck Oct 23 '19 at 12:55
  • 1
    look at [`explode()`](https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.DataFrame.explode.html) after the split – anky Oct 23 '19 at 12:58
  • 1
    @anky_91 explode() was added in version 0.25, is there any other way to solve in older versions? – vb_rises Oct 23 '19 at 12:59
  • 1
    @vb_rises : check : https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe/53218939#53218939 – anky Oct 23 '19 at 13:00

4 Answers4

16

You can do:

df["Shape"]=df["Shape"].str.split("\r\n")
print(df.explode("Shape").reset_index(drop=True))

Output:

   Color    Shape   Price
0   Green   Rectangle   10
1   Green   Triangle    10
2   Green   Octangle    10
3   Blue    Rectangle   15
4   Blue    Triangle    15
Sociopath
  • 13,068
  • 19
  • 47
  • 75
  • 1
    I tried this with a sample df and I got AttributeError: 'DataFrame' object has no attribute 'explode'. Do you have some library you imported that allows you to do that? – MBA Coder Oct 23 '19 at 13:13
  • 1
    I think you are using different `pandas` version. `explode` was introduced in `0.25` onwards. – Sociopath Oct 23 '19 at 13:20
  • Thanks Akshay, I have 0.24, I will have to update pandas – MBA Coder Oct 23 '19 at 13:52
4

This might not be the most efficient way to do it but I can confirm that it works with the sample df:

data = [['Green', 'Rectangle\r\nTriangle\r\nOctangle', 10], ['Blue', 'Rectangle\r\nTriangle', 15]]   
df = pd.DataFrame(data, columns = ['Color', 'Shape', 'Price'])
new_df = pd.DataFrame(columns = ['Color', 'Shape', 'Price'])

for index, row in df.iterrows():
    split = row['Shape'].split('\r\n')
    for shape in split:
        new_df = new_df.append(pd.DataFrame({'Color':[row['Color']], 'Shape':[shape], 'Price':[row['Price']]}))

new_df = new_df.reset_index(drop=True)
print(new_df)

Output:

   Color Price      Shape
0  Green    10  Rectangle
1  Green    10   Triangle
2  Green    10   Octangle
3   Blue    15  Rectangle
4   Blue    15   Triangle
MBA Coder
  • 374
  • 1
  • 3
  • 13
3

First, you'll need to split the Shape by white spaces, that will give you list of shapes. Then, use df.explode to unpack the list and create new rows for each of them

df["Shape"] = df.Shape.str.split()
df.explode("Shape")
Darren Christopher
  • 3,893
  • 4
  • 20
  • 37
2

As commented, str.split() followed by explode is helpful. If you are not on Pandas 0.25, then you can use melt afterward:

(pd.concat( (df.Shape.str.split('\r\n', expand=True), 
            df[['Color','Price']]),
          axis=1)
   .melt(id_vars=['Color', 'Price'], value_name='Shape')
   .dropna()
)

Output:

   Color  Price variable      Shape
0  Green     10        0  Rectangle
1   Blue     15        0  Rectangle
2  Green     10        1   Triangle
3   Blue     15        1   Triangle
4  Green     10        2   Octangle
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74