10

Question

There are two questions that look similar but they're not the same question: here and here. They both call a method of GroupBy, such as count() or aggregate(), which I know returns a DataFrame. What I'm asking is how to convert the GroupBy (class pandas.core.groupby.DataFrameGroupBy) object itself into a DataFrame. I'll illustrate below.

Example

Construct an example DataFrame as follows.

data_list = []
for name in ["sasha", "asa"]:
    for take in ["one", "two"]:
        row = {"name": name, "take": take, "score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}
        data_list.append(row)
data = pandas.DataFrame(data_list)

The above DataFrame should look like the following (with different numbers obviously).

    name  ping     score take
0  sasha    72  0.923263  one
1  sasha    14  0.724720  two
2    asa    76  0.774320  one
3    asa    71  0.128721  two

What I want to do is to group by the columns "name" and "take" (in that order), so that I can get a DataFrame indexed by the multiindex constructed from the columns "name" and "take", like below.

               score  ping
 name take        
sasha  one  0.923263    72
       two  0.724720    14
  asa  one  0.774320    76
       two  0.128721    71

How do I achieve that? If I do grouped = data.groupby(["name", "take"]), then grouped is a pandas.core.groupby.DataFrameGroupBy instance. What is the correct way of doing this?

Alex
  • 3,316
  • 4
  • 26
  • 52
Ray
  • 7,833
  • 13
  • 57
  • 91

1 Answers1

12

You need set_index:

data = data.set_index(['name','take'])
print (data)
            ping     score
name  take                
sasha one     46  0.509177
      two     77  0.828984
asa   one     51  0.637451
      two     51  0.658616
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Oooooh!!!! Riiiiiight!!!!! OK I'll accept this answer in 9 minutes when Stack Overflow lets me. Thank you. – Ray Oct 25 '16 at 09:42
  • 1
    When I attempt to use this answer, I get an `AttributeError`. "Cannot access callable attribute 'set_index' of 'DataFrameGroupBy' objects, try using the 'apply' method" – Nate Feb 14 '17 at 01:47
  • 1
    @Nate - It seems `data` is not `DataFrame`, but output of groupby - so need `g = df.groupby('col')` and then `g.apply(lambda x: x['col1'].set_index())` – jezrael Feb 14 '17 at 06:17
  • 5
    This doesn't answer the actual question: how to convert DataFrameGroupBy to DataFrame. – James Hirschorn May 15 '18 at 05:06
  • @JamesHirschorn - You are right, title is wrong. Better should be how convert MultiIndex to columns in DataFrame. – jezrael May 15 '18 at 05:26