pandas dataframe - ungroup concatenated column

Question

I am trying to ungroup a concatenated column in a dataframe. In particular, I am trying to convert

    a  b         c
i0  1  a     k1;k2
i1  2  b        k3
i2  3  c  k4;k5;k6
i3  4  d        k7

into

    a  b   c
i0  1  a  k1
i0  1  a  k2
i1  2  b  k3
i2  3  c  k4
i2  3  c  k5
i2  3  c  k6
i3  4  d  k7

I managed to do this using the code

import pandas as pd
data = pd.DataFrame({'a':[1,2,3,4],'b':list('abcd'),'c':['k1;k2','k3','k4;k5;k6','k7']},
                    index=['i'+str(i) for i in range(4)])
tmp = data['c'].str.split(';', expand=True).stack().reset_index(level=1, drop=True)
tmp.name = 'c'
data.drop('c',axis='columns',inplace=True)
data = data.join(tmp)

but it seems an incredibly convoluted way of doing something that is so simple. Is there a better way to do this using pandas?

Quang Hoang · Answer 1 · 2019-06-25T19:31:04.010

0

Here's an answer that is not in the linked (unnest) question:

(df.reset_index()
   .set_index(['index','a','b'])
   .c.str
   .split(';',expand=True)
   .stack()
   .reset_index(level=-1,drop=True)
   .reset_index(level=(1,2))
)

Output:

       a  b   0
index          
i0     1  a  k1
i0     1  a  k2
i1     2  b  k3
i2     3  c  k4
i2     3  c  k5
i2     3  c  k6
i3     4  d  k7

edited Jun 25 '19 at 19:31

answered Jun 25 '19 at 19:27

Quang Hoang

146,074
10
56
74

Are you talking about this? https://stackoverflow.com/a/50731254/4909087 – cs95 Jun 25 '19 at 19:29
hah, yes. I always like this better than unnesting anyway. – Quang Hoang Jun 25 '19 at 19:30
since this is string type , so you can expan by using str.split , but most of time that is list object within the columns that is why I did not included this in unnesting :-) – BENY Jun 25 '19 at 19:34
But yes, I'm not happy with the presentation of options (there are too many answers, it is difficult to tell what is the best one to use), nor am I happy with the fact that only vertical unnesting has been covered, not horizontal cc @WeNYoBen – cs95 Jun 25 '19 at 19:35
1

@cs95 horizontal should be easy by using pd.DataFrame(df.c.tolist(),index=df.index), then join :-) – BENY Jun 25 '19 at 19:37
Thanks for the answer. It works nicely, but unfortunately this answer is also verbose (like mine) and in addition it explicitly refers to columns 'a' and 'b' that is t be avoided in case of larger dataframes (in my view at least). It should be somehow possible to do this consisely with pandas! – karpan Jun 25 '19 at 19:37
@WeNYoBen lol add it to your answer man, then I can use your dupe for hammering those type of questions too. – cs95 Jun 25 '19 at 19:39
@cs95 let me update lol – BENY Jun 25 '19 at 19:39

pandas dataframe - ungroup concatenated column

1 Answers1