7

I've got a pandas dataframe in which one of the columns contains lists with different lengths. The solutions to explode lists in pandas all assume that the lists to be exploded are all the same length.

This is my df:

    Dep     Exp     Fl-No   Shared Codes
0   20:58   20:55   LX 736  [No shared codes]
1   21:23   20:55   LX 818  [Dummy, LH 5809]
2   21:27   21:00   JU 375  [No shared codes]
4   21:28   21:00   LX 770  [Dummy, SN 5102]
7   21:31   21:10   LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]

And this what I am looking for:

    Dep     Exp     Fl-No   Shared Codes
0   20:58   20:55   LX 736  No shared codes
1   21:23   20:55   LX 818  Dummy
1   21:23   20:55   LX 818  LH 5809
2   21:27   21:00   JU 375  No shared codes
4   21:28   21:00   LX 770  Dummy
4   21:28   21:00   LX 770  SN 5102
7   21:31   21:10   LX 1842 Dummy
7   21:31   21:10   LX 1842 LH 5880
7   21:31   21:10   LX 1842 TP 8184
7   21:31   21:10   LX 1842 A3 1985

Has anybody got any suggestions?

smci
  • 32,567
  • 20
  • 113
  • 146
BarJacks
  • 119
  • 2
  • 6
  • @Wen I don't think OP can use `wide_to_long` here. The data isn't in the right format. – cs95 Aug 25 '17 at 16:09
  • @Wen Actually I'm wrong. I found a way with wide_to_long but it is so bad that it isn't worth putting. – cs95 Aug 25 '17 at 16:14
  • @cᴏʟᴅsᴘᴇᴇᴅ how about PiR 's answer ? – BENY Aug 25 '17 at 16:29
  • @Wen I upvoted it. Why? – cs95 Aug 25 '17 at 16:30
  • @cᴏʟᴅsᴘᴇᴇᴅ I mean in the link...PiR 's answer .. sorry for the confusion . – BENY Aug 25 '17 at 16:30
  • @Wen I upvoted that also :p Yeah, those will work but I'm not sure they're general solutions to this problem of ragged list columns. – cs95 Aug 25 '17 at 16:33
  • 4
    @cᴏʟᴅsᴘᴇᴇᴅ Just saw this type of question multiple times..I really hope pandas can documented one of the solution and name it `unlistify` – BENY Aug 25 '17 at 16:34

4 Answers4

6

Very similar to @coldspeed. I took a few different steps.

s = df['Shared Codes']
i = np.arange(len(df)).repeat(s.str.len())
df.iloc[i, :-1].assign(**{'Shared Codes': np.concatenate(s.values)})

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
1  21:23  20:55   LX 818          LH 5809
2  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
4  21:28  21:00   LX 770          SN 5102
7  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
7  21:31  21:10  LX 1842          TP 8184
7  21:31  21:10  LX 1842          A3 1985
cs95
  • 379,657
  • 97
  • 704
  • 746
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 1
    It seems `np.repeat` is a necessity no matter what route you choose to take. This is one of a few solutions I felt genuinely good about when I got it. – cs95 Aug 25 '17 at 16:27
4

One possibility using np.repeat and np.hstack:

print(df)

     Dep    Exp    Fl-No                        Shared Codes
0  20:58  20:55   LX 736                   [No shared codes]
1  21:23  20:55   LX 818                    [Dummy, LH 5809]
2  21:27  21:00   JU 375                   [No shared codes]
4  21:28  21:00   LX 770                    [Dummy, SN 5102]
7  21:31  21:10  LX 1842  [Dummy, LH 5880, TP 8184, A3 1985]


x = df.iloc[:, :-1].values.repeat(df['Shared Codes'].apply(len), 0)
y = df['Shared Codes'].apply(pd.Series).stack().values.reshape(-1, 1)

out = pd.DataFrame(np.hstack((x, y)), columns=df.columns)
print(out)

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985
cs95
  • 379,657
  • 97
  • 704
  • 746
  • @cᴏʟᴅsᴘᴇᴇᴅ`df.set_index(['Dep','Exp','Fl-No'])['Shared Codes'].apply(pd.Series).stack().reset_index().drop('level_3',1)` another similar question link https://stackoverflow.com/questions/45871154/how-to-efficiently-create-a-pivot-table/45871283#45871283 – BENY Aug 25 '17 at 16:39
  • @cᴏʟᴅsᴘᴇᴇᴅ posted ~ :) – BENY Aug 25 '17 at 16:54
  • @BarJacks Still not too late. Please accept an answer. – cs95 Jan 09 '19 at 16:56
4

Pandas >=0.25

df:

    Name    Data
0   Bar [Product, Item, X]
1   Foo [Product, Misc]

Using Explode:

df = df.explode('Data')

df:

    Name    Data
0   Bar Product
0   Bar Item
0   Bar X
1   Foo Product
1   Foo Misc
Pygirl
  • 12,969
  • 5
  • 30
  • 43
2

Ok, I will post it again for more information and other genius solutions please check link1 and link2

df.set_index(['Dep','Exp','Fl-No'])['Shared Codes'].apply(pd.Series).stack().reset_index().drop('level_3‌​',1)

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985

Also, using pd.wide_to_long However, personally do not recommended cause overkill.

df1=df['Shared Codes'].apply(pd.Series)
df1.columns='sur'+df1.columns.astype(str)
df=pd.concat([df,df1],axis=1)
pd.wide_to_long(df,['sur'],['Dep','Exp','Fl-No'],'lol').reset_index().drop(['lol','Shared Codes'],axis=1).dropna()

     Dep    Exp    Fl-No     Shared Codes
0  20:58  20:55   LX 736  No shared codes
1  21:23  20:55   LX 818            Dummy
2  21:23  20:55   LX 818          LH 5809
3  21:27  21:00   JU 375  No shared codes
4  21:28  21:00   LX 770            Dummy
5  21:28  21:00   LX 770          SN 5102
6  21:31  21:10  LX 1842            Dummy
7  21:31  21:10  LX 1842          LH 5880
8  21:31  21:10  LX 1842          TP 8184
9  21:31  21:10  LX 1842          A3 1985
BENY
  • 317,841
  • 20
  • 164
  • 234