Pandas Explode lists with different lengths, into rows

Question

I've got a pandas dataframe in which one of the columns contains lists with different lengths. The solutions to explode lists in pandas all assume that the lists to be exploded are all the same length.

This is my df:

    Dep     Exp     Fl-No   Shared Codes
0   20:58   20:55   LX 736  [No shared codes]
1   21:23   20:55   LX 818  [Dummy, LH 5809]
2   21:27   21:00   JU 375  [No shared codes]
4   21:28   21:00   LX 770  [Dummy, SN 5102]
7   21:31   21:10   LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]

And this what I am looking for:

    Dep     Exp     Fl-No   Shared Codes
0   20:58   20:55   LX 736  No shared codes
1   21:23   20:55   LX 818  Dummy
1   21:23   20:55   LX 818  LH 5809
2   21:27   21:00   JU 375  No shared codes
4   21:28   21:00   LX 770  Dummy
4   21:28   21:00   LX 770  SN 5102
7   21:31   21:10   LX 1842 Dummy
7   21:31   21:10   LX 1842 LH 5880
7   21:31   21:10   LX 1842 TP 8184
7   21:31   21:10   LX 1842 A3 1985

Has anybody got any suggestions?

@Wen I don't think OP can use `wide_to_long` here. The data isn't in the right format. — cs95, Aug 25 '17 at 16:09
@Wen Actually I'm wrong. I found a way with wide_to_long but it is so bad that it isn't worth putting. — cs95, Aug 25 '17 at 16:14
@cᴏʟᴅsᴘᴇᴇᴅ I mean in the link...PiR 's answer .. sorry for the confusion . — BENY, Aug 25 '17 at 16:30
@Wen I upvoted that also :p Yeah, those will work but I'm not sure they're general solutions to this problem of ragged list columns. — cs95, Aug 25 '17 at 16:33
@cᴏʟᴅsᴘᴇᴇᴅ Just saw this type of question multiple times..I really hope pandas can documented one of the solution and name it `unlistify` — BENY, Aug 25 '17 at 16:34

score 6 · Answer 1 · edited Jan 09 '19 at 16:55

Very similar to @coldspeed. I took a few different steps.

s = df['Shared Codes']
i = np.arange(len(df)).repeat(s.str.len())
df.iloc[i, :-1].assign(**{'Shared Codes': np.concatenate(s.values)})

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
1  21:23  20:55   LX 818          LH 5809
2  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
4  21:28  21:00   LX 770          SN 5102
7  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
7  21:31  21:10  LX 1842          TP 8184
7  21:31  21:10  LX 1842          A3 1985

It seems `np.repeat` is a necessity no matter what route you choose to take. This is one of a few solutions I felt genuinely good about when I got it. — cs95, Aug 25 '17 at 16:27

score 4 · Answer 2 · answered Aug 25 '17 at 15:59

One possibility using np.repeat and np.hstack:

print(df)

     Dep    Exp    Fl-No                        Shared Codes
0  20:58  20:55   LX 736                   [No shared codes]
1  21:23  20:55   LX 818                    [Dummy, LH 5809]
2  21:27  21:00   JU 375                   [No shared codes]
4  21:28  21:00   LX 770                    [Dummy, SN 5102]
7  21:31  21:10  LX 1842  [Dummy, LH 5880, TP 8184, A3 1985]


x = df.iloc[:, :-1].values.repeat(df['Shared Codes'].apply(len), 0)
y = df['Shared Codes'].apply(pd.Series).stack().values.reshape(-1, 1)

out = pd.DataFrame(np.hstack((x, y)), columns=df.columns)
print(out)

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985

@cᴏʟᴅsᴘᴇᴇᴅ`df.set_index(['Dep','Exp','Fl-No'])['Shared Codes'].apply(pd.Series).stack().reset_index().drop('level_3',1)` another similar question link https://stackoverflow.com/questions/45871154/how-to-efficiently-create-a-pivot-table/45871283#45871283 — BENY, Aug 25 '17 at 16:39

score 4 · Answer 3 · answered Feb 21 '20 at 08:39

4

Pandas >=0.25

df:

    Name    Data
0   Bar [Product, Item, X]
1   Foo [Product, Misc]

Using Explode:

df = df.explode('Data')

df:

    Name    Data
0   Bar Product
0   Bar Item
0   Bar X
1   Foo Product
1   Foo Misc

answered Feb 21 '20 at 08:39

Pygirl

12,969
5
30
43

BENY · Answer 4 · 2017-08-25T17:04:58.710

Ok, I will post it again for more information and other genius solutions please check link1 and link2

df.set_index(['Dep','Exp','Fl-No'])['Shared Codes'].apply(pd.Series).stack().reset_index().drop('level_3‌',1)

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985

Also, using pd.wide_to_long However, personally do not recommended cause overkill.

df1=df['Shared Codes'].apply(pd.Series)
df1.columns='sur'+df1.columns.astype(str)
df=pd.concat([df,df1],axis=1)
pd.wide_to_long(df,['sur'],['Dep','Exp','Fl-No'],'lol').reset_index().drop(['lol','Shared Codes'],axis=1).dropna()

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985

Pandas Explode lists with different lengths, into rows

4 Answers4

Linked