How to do group by in two levels on python pandas an count values?

Question

I have the following dataframe

import pandas as pd
import numpy as np
df = pd.DataFrame()
df['Name'] = ['AK', 'Ram', 'Ram', 'Singh', 'Murugan', 'Kishore', 'AK']
df['Email'] = ['AK@gmail.com', 'a@djgbj.com', 'a@djgbj.com', '3454@ghhg.io', 'dgg@qw.cc', 'dgdg@dg.com', 'AK@gmail.com']
df['Cat'] = ['ab1', 'ab2', 'ab1', 'ab2', 'ab1', 'ab2', 'ab1']
df['Id'] = ['abc1', 'abc2', 'abc3', 'abc4', 'abc5', 'abc6', 'abc7']

For the following code

dfs=df.groupby(['Email', 'Cat'])['Email'].count().reset_index(name='Number')

It gives:

      Email         Cat Number
0   3454@ghhg.io    ab2 1
1   AK@gmail.com    ab1 2
2   a@djgbj.com     ab1 1
3   a@djgbj.com     ab2 1
4   dgdg@dg.com     ab2 1
5   dgg@qw.cc       ab1 1

How to group by on dfs to get the following output?

Cat Number Count
ab1 1      3
ab1 2      1
ab2 1      3

Multiple solutions, e.g. see [Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series](https://stackoverflow.com/questions/50328246/pandas-groupby-size-vs-series-value-counts-vs-collections-counter-with-multiple) — jpp, Nov 12 '18 at 11:12
I think you want `c = df.groupby(['Email', 'Cat'])['Email'].cumcount(); df.groupby(['Cat', c]).Cat.count()` if not then your question has been phrased poorly. — cs95, Nov 12 '18 at 17:07

Space Impact · Accepted Answer · 2018-11-12T11:16:18.067

4

Use groupby+size and reset_index:

df1 = dfs.groupby(['Cat','Number']).size().reset_index(name='Count')

Or:

df1 = dfs.groupby(['Cat','Number'])['Email'].value_counts().reset_index(name='Count')

print(df1)
   Cat  Number  Count
0  ab1       1      2
1  ab1       2      1
2  ab2       1      3

edited Nov 12 '18 at 11:16

answered Nov 12 '18 at 11:12

Space Impact

13,085
23
48

The weird thing is that the solution is basically the same as part of OP's code, just done once more. – John Zwinck Nov 12 '18 at 11:19
@coldspeed Not sure, but from `dfs` I did what I can, need to check with OP to clear the confusion. – Space Impact Nov 12 '18 at 11:28
@JohnZwinck Yes, but that is how OP's output. – Space Impact Nov 12 '18 at 11:30
@SandeepKadapa Not sure what the confusion is, they've indicated their expected output at the end of the post. – cs95 Nov 12 '18 at 12:04
@coldspeed OP said he wants to use groupby on `dfs` (which is already a groupby output from `df`) to get his desired result. So, I gave a possible answer using `dfs`. Now, coming to the solution it is not exactly as OP wanted, but that is the only possible solution using `dfs`. So, let the OP decide what is right! – Space Impact Nov 12 '18 at 13:52

Karn Kumar · Answer 2 · 2018-11-12T11:23:10.050

1

Simply:

dfs.groupby(['Cat', 'Number']).count()

reproduced the below, which works..

>>> dfs.groupby(['Cat', 'Number']).count()
            Email
Cat Number
ab1 1           2
    2           1
ab2 1           3

OR

>>> dfs.groupby(['Cat', 'Number'])['Email'].count()
Cat  Number
ab1  1         2
     2         1
ab2  1         3
Name: Email, dtype: int64

edited Nov 12 '18 at 11:23

answered Nov 12 '18 at 11:12

Karn Kumar

8,518
3
27
53

This won't produce OP's expected output. – cs95 Nov 12 '18 at 11:16
@coldspeed, it works correctly , see the POST , OP mentioned `dfs=df.groupby(['Email', 'Cat'])['Email'].count().reset_index(name='Number')` – Karn Kumar Nov 12 '18 at 11:24
Regardless of what they mentioned, it isn't their "expected output". – cs95 Nov 12 '18 at 11:53

How to do group by in two levels on python pandas an count values?

2 Answers2