Explode pandas dataframe singe row into multiple rows across multiple columns simultaneously

Question

I have a dataframe df as

df
       col1 act_id col2                                                                                                 
   --------------------
0  40;30;30   act1 A;B;C
1  25;50;25   act2 D;E;F
2     70;30   act3 G;H

I want to break each record in such a way that values in column col1 and col2 explode into multiple rows but such that the first value in col1 after splitting upon ';' corresponds to the first value in col2 after splitting upon ';'. So my desired_df should look like this:

desired_df
       col1 act_id col2                                                                                                 
       ---------------
    0  40   act1   A
    1  30   act1   B
    2  30   act1   C
    3  25   act2   D
    4  50   act2   E
    5  25   act2   F                                                                                                  
    6  70   act3   G                                                                              
    7  30   act3   H

NOTE: this is not the same as Split (explode) pandas dataframe string entry to separate rows as here the exploding/splitting of one record is not just across one column but the need is to split or explode one row into multiple rows, in two columns simultaneously.

Any help is appreciated. Thanks

you can use `unnesting(df,['col1','col2'])` from here: https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe/53218939#53218939 after splitting the cols by `;` like `df.col1=df.col1.str.split(';')` and same for col2 — anky, Jun 04 '19 at 04:41

iamklaus · Accepted Answer · 2019-06-04T06:00:48.647

1

one way to do this

df2.set_index('act_id').apply(lambda x: pd.Series(x.col1.split(';'),x.col2.split(';')), axis=1).stack().dropna().reset_index()

df2.columns = ['col1','act_id','col2']

  col1 act_id col2
0  A    act1   40 
1  B    act1   30 
2  C    act1   30 
3  D    act2   25 
4  E    act2   50 
5  F    act2   25 
6  G    act3   70 
7  H    act3   30

edited Jun 04 '19 at 06:00

answered Jun 04 '19 at 04:48

iamklaus

3,720
2
12
21

this solution worked for this particular df, although when i tried running this on a bigger dataframe with about 1M rows, i got the error as `ValueError: cannot reindex from a duplicate axis` I don't understand because even the bigger original dataframe had just these columns. – Aman Singh Jun 04 '19 at 05:28
try the updated..problem might be related to duplicated index values, if the update doesn't works then try breaking the code and running it step by step, will help u debug the problem (do post on which step u faced the error) – iamklaus Jun 04 '19 at 05:45
I tried to break it down step by step and I found that the error occurs at the `set_index('act_id')` method. I removed all other methods and still got the error as `ValueError: cannot reindex from a duplicate axis` Is there any other way to get to the desired result...? – Aman Singh Jun 04 '19 at 06:00
i still end up having the error `ValueError: cannot reindex from a duplicate axis` – Aman Singh Jun 04 '19 at 09:19

score 0 · Answer 2 · answered Jun 04 '19 at 05:39

0

the idea is col1 and col2 should be exploded and then merge on index and join back to the original dataframe.

df1 = df.col1.str.split(";").apply(pd.Series).stack().droplevel(1).reset_index()
df2 = df.col2.str.split(";").apply(pd.Series).stack().droplevel(1).reset_index()
df12 = pd.merge(df1, df2[0], left_index=True, right_index=True)
df12.columns = ["index", "col1", "col2"]

pd.merge(df12, df["act_id"], left_on="index", right_index=True)

answered Jun 04 '19 at 05:39

Dyno Fu

8,753
4
39
64

`AttributeError: 'Series' object has no attribute 'droplevel1'` – Aman Singh Jun 04 '19 at 05:46
`droplevel(1)`. – Dyno Fu Jun 04 '19 at 16:21

Allohvk · Answer 3 · 2020-11-25T10:48:03.020

0

A generic function could be:

list_cols = {'col1','col2'}
other_cols = list(set(df.columns) - set(list_cols))
exploded = [df[col].explode() for col in list_cols]
desired_df = pd.DataFrame(dict(zip(list_cols, exploded)))
desired_df = df[other_cols].merge(desired_df, how="right", left_index=True, right_index=True)

Please strsplit the col 1 and 2 before calling above function

edited Nov 25 '20 at 10:48

answered Nov 25 '20 at 10:30

Allohvk

915
8
14

Explode pandas dataframe singe row into multiple rows across multiple columns simultaneously

3 Answers3