Pandas: GroupBy to DataFrame

Question

There is a very popular S.O. question regarding groupby to dataframe see here. Unfortunately, I do not think this particular use case is the most useful.

Suppose you have what could be a hierarchical dataset in a flattened form:

e.g.

     key    val 
0    'a'    2
1    'a'    1
2    'b'    3
3    'b'    4

what I wish to do is convert that dataframe to this structure

    'a'  'b'
0    2    3
1    1    4

I thought this would be as simple as

pd.DataFrame(df.groupby('key').groups)

but it is not.

So how can I make this transformation?

What if there are two ('a', 2) pairs? What should be the expected result? — Tai, Jan 14 '18 at 19:41
Thanks. Updated my answer :P I suggested you update your example to help people understand your problem. You can take data from my answer. — Tai, Jan 14 '18 at 20:04

score 6 · Accepted Answer · answered Jan 14 '18 at 17:22

6

df.assign(index=df.groupby('key').cumcount()).pivot('index','key','val')
Out[369]: 
key    'a'  'b'
index          
0        2    3
1        1    4

answered Jan 14 '18 at 17:22

BENY

317,841
20
164
234

Could you please explain how this works, as well as why you chose to make this hierarchical by assigning a value to index? – SumNeuron Jan 14 '18 at 18:16
@SumNeuron , when you want ot pivot you need three para, index, column , value, In your example df, you do not have index, the cumcount is to get then index , then we can pivot your sample df to expected output – BENY Jan 14 '18 at 18:19
is there a way to drop the index label? – SumNeuron Jan 14 '18 at 18:35
@SumNeuron adding rename_axis(None,axis=1) https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename_axis.html – BENY Jan 14 '18 at 19:12
@Wen that ends up dropping the label `key` and keeping `index` – SumNeuron Jan 15 '18 at 07:55
@Wen also, is there a way to do this with a list of values instead of just 'val' ? – SumNeuron Jan 15 '18 at 12:05
@SumNeuron rename_axis(None,axis=1) and rename_axis(None,axis=0), you need to do it twice . For your extra question , can you show me the data ? Or maybe we can open a new question ? – BENY Jan 15 '18 at 15:25
@Wen the new question is here https://stackoverflow.com/questions/48265340/pandas-doing-quantile-normalization-on-a-subset-of-a-flattened-dataset-with-rep It uses a lot from what I have learned from your answer and another question :) – SumNeuron Jan 15 '18 at 15:31

score 3 · Answer 2 · answered Jan 14 '18 at 17:14

3

what about the following approach?

In [134]: pd.DataFrame(df.set_index('val').groupby('key').groups)
Out[134]:
   a  b
0  2  3
1  1  4

answered Jan 14 '18 at 17:14

MaxU - stand with Ukraine

205,989
36
386
419

Tai · Answer 3 · 2018-01-14T20:14:37.027

Think this should work. Note the example is different from OP's. There are duplicates in the example.

df = pd.DataFrame({'key': {0: "'a'", 1: "'a'", 2: "'b'", 3: "'b'", 4: "'a'"}, 
                   'val': {0: 2, 1: 1, 2: 3, 3: 4, 4: 2}})


df_wanted = pd.DataFrame.from_dict(
                df.groupby("key")["val"].apply(list).to_dict(), orient='index'
            ).transpose()


    'a'     'b'
0   2.0     3.0
1   1.0     4.0
2   2.0     NaN

df.groupby("key")["val"].apply(list).to_dict() creates a dictionary {"'a'": [2, 1, 2], "'b'": [3, 4]}. Then, we transfer the dictionary to a DataFrame object.

We use DataFrame.from_dict function. Because the dictionary contains different lengths, we need to pass in an extra argument orient='index' and then do transpose() in the end.

Reference

Creating dataframe from a dictionary where entries have different lengths

score 0 · Answer 4 · answered Jan 14 '18 at 17:11

0

I'm new to Pandas but this seems to work:

gb = df.groupby('key')
k = 'val'
pd.DataFrame(
    [gb.get_group(x)[k].tolist() for x in gb.groups], 
    index=[x for x in gb.groups]
).transpose()

answered Jan 14 '18 at 17:11

SumNeuron

4,850
5
39
107

score 0 · Answer 5 · answered Jan 14 '18 at 19:08

0

Let's use set_index and unstack with cumcount:

df.set_index([df.groupby('key').cumcount(),'key'])['val']\
  .unstack().rename_axis(None,1)

Output:

   'a'  'b'
0    2    3
1    1    4

answered Jan 14 '18 at 19:08

Scott Boston

147,308
15
139
187

Pandas: GroupBy to DataFrame

5 Answers5

Linked