split rows in pandas dataframe

Question

I stuck with the problem how to divide a pandas dataframe by row,

I have a similar dataframe with a column where values are separated by \r\n and they are in one cell,

    Color                              Shape  Price
0  Green  Rectangle\r\nTriangle\r\nOctangle     10
1   Blue              Rectangle\r\nTriangle     15

I need to divide this cell into several cells with the same values as other columns, e.g.

   Color      Shape  Price
0  Green  Rectangle     10
1  Green   Triangle     10
2  Green   Octangle     10
3   Blue  Rectangle     15
4   Blue    Tringle     15

How do I do it well?

Try `df.Shape.str.split(expand=True).stack()`. Does that help? — John Zwinck, Oct 23 '19 at 12:55
look at [`explode()`](https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.DataFrame.explode.html) after the split — anky, Oct 23 '19 at 12:58
@anky_91 explode() was added in version 0.25, is there any other way to solve in older versions? — vb_rises, Oct 23 '19 at 12:59
@vb_rises : check : https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe/53218939#53218939 — anky, Oct 23 '19 at 13:00

score 16 · Answer 1 · answered Oct 23 '19 at 13:00

16

You can do:

df["Shape"]=df["Shape"].str.split("\r\n")
print(df.explode("Shape").reset_index(drop=True))

Output:

   Color    Shape   Price
0   Green   Rectangle   10
1   Green   Triangle    10
2   Green   Octangle    10
3   Blue    Rectangle   15
4   Blue    Triangle    15

answered Oct 23 '19 at 13:00

Sociopath

13,068
19
47
75

1

I tried this with a sample df and I got AttributeError: 'DataFrame' object has no attribute 'explode'. Do you have some library you imported that allows you to do that? – MBA Coder Oct 23 '19 at 13:13
1

I think you are using different `pandas` version. `explode` was introduced in `0.25` onwards. – Sociopath Oct 23 '19 at 13:20
Thanks Akshay, I have 0.24, I will have to update pandas – MBA Coder Oct 23 '19 at 13:52

MBA Coder · Answer 2 · 2019-10-23T13:46:56.497

This might not be the most efficient way to do it but I can confirm that it works with the sample df:

data = [['Green', 'Rectangle\r\nTriangle\r\nOctangle', 10], ['Blue', 'Rectangle\r\nTriangle', 15]]   
df = pd.DataFrame(data, columns = ['Color', 'Shape', 'Price'])
new_df = pd.DataFrame(columns = ['Color', 'Shape', 'Price'])

for index, row in df.iterrows():
    split = row['Shape'].split('\r\n')
    for shape in split:
        new_df = new_df.append(pd.DataFrame({'Color':[row['Color']], 'Shape':[shape], 'Price':[row['Price']]}))

new_df = new_df.reset_index(drop=True)
print(new_df)

Output:

   Color Price      Shape
0  Green    10  Rectangle
1  Green    10   Triangle
2  Green    10   Octangle
3   Blue    15  Rectangle
4   Blue    15   Triangle

score 3 · Answer 3 · answered Oct 23 '19 at 13:00

3

First, you'll need to split the Shape by white spaces, that will give you list of shapes. Then, use df.explode to unpack the list and create new rows for each of them

df["Shape"] = df.Shape.str.split()
df.explode("Shape")

answered Oct 23 '19 at 13:00

Darren Christopher

3,893
4
20
37

Quang Hoang · Answer 4 · 2019-10-23T13:17:29.973

As commented, str.split() followed by explode is helpful. If you are not on Pandas 0.25, then you can use melt afterward:

(pd.concat( (df.Shape.str.split('\r\n', expand=True), 
            df[['Color','Price']]),
          axis=1)
   .melt(id_vars=['Color', 'Price'], value_name='Shape')
   .dropna()
)

Output:

   Color  Price variable      Shape
0  Green     10        0  Rectangle
1   Blue     15        0  Rectangle
2  Green     10        1   Triangle
3   Blue     15        1   Triangle
4  Green     10        2   Octangle

split rows in pandas dataframe

4 Answers4

Linked