86

I want to split the following dataframe based on column ZZ

df = 
        N0_YLDF  ZZ        MAT
    0  6.286333   2  11.669069
    1  6.317000   6  11.669069
    2  6.324889   6  11.516454
    3  6.320667   5  11.516454
    4  6.325556   5  11.516454
    5  6.359000   6  11.516454
    6  6.359000   6  11.516454
    7  6.361111   7  11.516454
    8  6.360778   7  11.516454
    9  6.361111   6  11.516454

As output, I want a new DataFrame with the N0_YLDF column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.

Leo K
  • 5,189
  • 3
  • 12
  • 27
user308827
  • 21,227
  • 87
  • 254
  • 417

4 Answers4

172
gb = df.groupby('ZZ')    
[gb.get_group(x) for x in gb.groups]
qwwqwwq
  • 6,999
  • 2
  • 26
  • 49
39

There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).

dfs = [x for _, x in df.groupby('ZZ')]
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • would this one liner work if I'm looking to make specific aggregations to every data frame? – DataPlug Mar 16 '22 at 23:03
  • This one-liner simply stores the dataframes in an array. What you do next is up to you. Maybe have a look at ALollz answer to access keys. – Anton vBR Mar 17 '22 at 10:25
12

In R there is a dataframe method called split. This is for all the R users out there:

def split(df, group):
     gb = df.groupby(group)
     return [gb.get_group(x) for x in gb.groups]
Jeff Mandell
  • 863
  • 7
  • 16
  • shouldn't you put it all into a series? ending with `pd.Series(...)` – Adam May 23 '17 at 19:47
  • 1
    This is amazing. Is there an easy way to get the key which identifies of the group, so I can return a list of tuples, like ```[ (key, gb.get_group(x) ) for x in gb.group]```? – rsmith54 Aug 22 '17 at 19:47
  • I found this, which makes this easy: https://stackoverflow.com/questions/42513049/get-all-keys-from-groupby-object-in-pandas – rsmith54 Aug 22 '17 at 19:59
  • 3
    Just to provide an answer to the comment (which is explained in more detail in the link: `[(key, gb.get_group(key)) for key in gb.groups]` – de1 Nov 22 '17 at 17:24
  • The same solution but with iterators `def split(df, group): gb = df.groupby(group) for g in gb.groups: yield gb.get_group(g)` – Jonatas Eduardo Oct 19 '21 at 14:04
7

Store them in a dict, which allows you access to the group DataFrames based on the group keys.

d = dict(tuple(df.groupby('ZZ')))
d[6]

#    N0_YLDF  ZZ        MAT
#1  6.317000   6  11.669069
#2  6.324889   6  11.516454
#5  6.359000   6  11.516454
#6  6.359000   6  11.516454
#9  6.361111   6  11.516454

If you need only a subset of the DataFrame, in this case just the 'NO_YLDF' Series, you can modify the dict comprehension.

d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1    6.317000
#2    6.324889
#5    6.359000
#6    6.359000
#9    6.361111
#Name: N0_YLDF, dtype: float64
ALollz
  • 57,915
  • 7
  • 66
  • 89