0

I am trying to ungroup a concatenated column in a dataframe. In particular, I am trying to convert

    a  b         c
i0  1  a     k1;k2
i1  2  b        k3
i2  3  c  k4;k5;k6
i3  4  d        k7

into

    a  b   c
i0  1  a  k1
i0  1  a  k2
i1  2  b  k3
i2  3  c  k4
i2  3  c  k5
i2  3  c  k6
i3  4  d  k7

I managed to do this using the code

import pandas as pd
data = pd.DataFrame({'a':[1,2,3,4],'b':list('abcd'),'c':['k1;k2','k3','k4;k5;k6','k7']},
                    index=['i'+str(i) for i in range(4)])
tmp = data['c'].str.split(';', expand=True).stack().reset_index(level=1, drop=True)
tmp.name = 'c'
data.drop('c',axis='columns',inplace=True)
data = data.join(tmp)

but it seems an incredibly convoluted way of doing something that is so simple. Is there a better way to do this using pandas?

karpan
  • 421
  • 1
  • 5
  • 13

1 Answers1

0

Here's an answer that is not in the linked (unnest) question:

(df.reset_index()
   .set_index(['index','a','b'])
   .c.str
   .split(';',expand=True)
   .stack()
   .reset_index(level=-1,drop=True)
   .reset_index(level=(1,2))
)

Output:

       a  b   0
index          
i0     1  a  k1
i0     1  a  k2
i1     2  b  k3
i2     3  c  k4
i2     3  c  k5
i2     3  c  k6
i3     4  d  k7
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Are you talking about this? https://stackoverflow.com/a/50731254/4909087 – cs95 Jun 25 '19 at 19:29
  • hah, yes. I always like this better than unnesting anyway. – Quang Hoang Jun 25 '19 at 19:30
  • since this is string type , so you can expan by using str.split , but most of time that is list object within the columns that is why I did not included this in unnesting :-) – BENY Jun 25 '19 at 19:34
  • But yes, I'm not happy with the presentation of options (there are too many answers, it is difficult to tell what is the best one to use), nor am I happy with the fact that only vertical unnesting has been covered, not horizontal cc @WeNYoBen – cs95 Jun 25 '19 at 19:35
  • 1
    @cs95 horizontal should be easy by using pd.DataFrame(df.c.tolist(),index=df.index), then join :-) – BENY Jun 25 '19 at 19:37
  • Thanks for the answer. It works nicely, but unfortunately this answer is also verbose (like mine) and in addition it explicitly refers to columns 'a' and 'b' that is t be avoided in case of larger dataframes (in my view at least). It should be somehow possible to do this consisely with pandas! – karpan Jun 25 '19 at 19:37
  • @WeNYoBen lol add it to your answer man, then I can use your dupe for hammering those type of questions too. – cs95 Jun 25 '19 at 19:39
  • @cs95 let me update lol – BENY Jun 25 '19 at 19:39