52

I had a dataframe and did a groupby in FIPS and summed the groups that worked fine.

kl = ks.groupby('FIPS')

kl.aggregate(np.sum)

I just want a normal Dataframe back but I have a pandas.core.groupby.DataFrameGroupBy object.

cs95
  • 379,657
  • 97
  • 704
  • 746
user1246428
  • 1,065
  • 3
  • 11
  • 15
  • 14
    The question title indicates that the question is about how to generally convert a groupby object back to a data frame, yet the question and the accepted answer are only about one special case (sum aggregation). Both the question and the accepted answer would be a lot more helpful if they were about how to generally convert a groupby object to a data frame, without performing any numeric processing on it. – Alex Nov 07 '19 at 10:03
  • to get the groups as a dataFrame use something like this ks.groupby('FIPS').get_group("What ever the groupby values you have"). – mahmoh May 27 '20 at 14:22

4 Answers4

28
 df_g.apply(lambda x: x) 

will return the original dataframe.

Tengfei Li
  • 337
  • 3
  • 2
  • 17
    But why is this needed? – cs95 Jan 22 '19 at 03:27
  • this is still returns DFGroupby – hungryMind May 10 '20 at 12:41
  • @cs95 This is equivalent to `pd.DataFrame(grouped.groups)`. The `GroupBy.apply` function apply func to every group and combine them together in a `DataFrame`. – C.K. Aug 20 '20 at 07:14
  • 2
    @C.K. I understand that, thank you. However, my point was more about why we need this method to return the original DataFrame if df_g itself is the original DataFrame? If it's a question of what apply does and how to apply a function to every group, that's a discussion for another post. 2c – cs95 Aug 20 '20 at 08:07
  • 1
    @cs95 Yeap, you're right. I vote for your comment the first time I saw this answer, cause I thought there must be an easier way like `grouped.to_df()`. However, after I checked the API of the `GroupBy` object, I found there wasn't such a function, so I came back to tell everyone this is the easiest way to do that. lol. – C.K. Aug 20 '20 at 10:13
  • In answer to @cs95, I can only speak to why I sought out this question: This was necessary for me to find how a grouping changed the indices, or after grouping to visualize what has been condensed. Often times this comes up for me due to a heavily nested multiindex or when wanting to perform a group on a grouped df. I suppose this is a shortcut for slicing, but as a new user to multiindex slicing, it has been necessary to find my way. – double0darbo Jul 27 '21 at 15:37
  • I see that today (pd '1.5.3'), one should first add either 'group_keys=True' or 'group_keys=False' as an argument to groupby, before trying above. It is still the right answer IMHO. – Oren Apr 14 '23 at 17:48
24

The result of kl.aggregate(np.sum) is a normal DataFrame, you just have to assign it to a variable to further use it. With some random data:

>>> df = DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
>>>                         'foo', 'bar', 'foo', 'foo'],
...                  'B' : ['one', 'one', 'two', 'three',
...                         'two', 'two', 'one', 'three'],
...                  'C' : randn(8), 'D' : randn(8)})
>>> grouped = df.groupby('A')
>>> grouped
<pandas.core.groupby.DataFrameGroupBy object at 0x04E2F630>
>>> test = grouped.aggregate(np.sum)
>>> test
            C         D
A                      
bar -1.852376  2.204224
foo -3.398196 -0.045082
joris
  • 133,120
  • 36
  • 247
  • 202
  • 2
    Actually, many of DataFrameGroupBy object methods such as (apply, transform, aggregate, head, first, last) return a DataFrame object. I used the method `filter` in [one](https://kenandeen.wordpress.com/2015/06/20/unisex-names-data-analysis-use-case/) of my blog posts. – Ken D Jun 20 '15 at 06:29
  • 3
    It's not a completely normal DataFrame. For example, if you try to call the .info() method on a GroupBy object, you get `AttributeError: Cannot access callable attribute 'info' of 'DataFrameGroupBy' objects, try using the 'apply' method.` – Adrian Keister Sep 10 '18 at 17:37
  • 3
    call .reset_index() to convert the grouped indices. – hungryMind May 10 '20 at 12:48
  • +1 @hungryMind - *that* is the answer. Re Joris answer - it may be a "dataframe" but it's not normal - you can see it has different column grouping of A vs C and D, which causes plots etc to fail when using as a normal dataframe. It needs collapsing with .reset_index() to make it proper! – TickboxPhil Jun 26 '21 at 14:20
  • kl.count() returns a DataFrame – vkt Mar 09 '22 at 15:51
  • There is what appears to be an undocumented property, `.obj`, which has the original object with grouped transformations applied. See https://stackoverflow.com/a/66879388/459863 A feature request with Pandas was also filed which remains open as of this writing: https://github.com/pandas-dev/pandas/issues/43902 – Wolfram Arnold Apr 27 '23 at 17:29
1

Using pd.concat, just like this:

   pd.concat(map(lambda x: x[1], groups))

Or also keep index aligned:

   pd.concat(map(lambda x: x[1], groups)).sort_index()
Rogers
  • 81
  • 1
  • 5
0

You can output the results of the groupby with a .head('# of rows')to a variable.

Ex: df2 = grouped.head(100)

Now you have a Pandas data frame "df2" with all your grouped data.

andy
  • 9
  • 1