Count unique values using pandas groupby

Question

I have data of the following form:

df = pd.DataFrame({
    'group': [1, 1, 2, 3, 3, 3, 4],
    'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan]
})
print(df)

#    group param
# 0      1     a
# 1      1     a
# 2      2     b
# 3      3   NaN
# 4      3     a
# 5      3     a
# 6      4   NaN

Non-null values within groups are always the same. I want to count the non-null value for each group (where it exists) once, and then find the total counts for each value.

I'm currently doing this in the following (clunky and inefficient) way:

param = []
for _, group in df[df.param.notnull()].groupby('group'):
    param.append(group.param.unique()[0])
print(pd.DataFrame({'param': param}).param.value_counts())

# a    2
# b    1

I'm sure there's a way to do this more cleanly and without using a loop, but I just can't seem to work it out. Any help would be much appreciated.

score 209 · Accepted Answer · edited Dec 12 '22 at 01:06

209

I think you can use SeriesGroupBy.nunique:

print (df.groupby('param')['group'].nunique())
param
a    2
b    1
Name: group, dtype: int64

Another solution with unique, then create new df by DataFrame.from_records, reshape to Series by stack and last value_counts:

a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a    2
b    1
dtype: int64

edited Dec 12 '22 at 01:06

BERA

1,345
3
16
36

answered Jan 01 '17 at 11:14

jezrael

822,522
95
1,334
1,252

I test it with `df = pd.DataFrame({ 'group': [1, 1, 2, 3, 3, 3, 4], 'param': ['a', 'c', 'b', np.nan, 'c', 'a', np.nan] })`, but your code return different output because use only first unique element of list in each `group`. My code return all unique values. Please check it if I understand what do you need. Thank you. – jezrael Jan 01 '17 at 11:43
How we get the column names – dondapati Jun 07 '18 at 06:01
2

@dondapati - add `.reset_index()` – jezrael Jun 07 '18 at 06:02
Note that this solution only produce a series, not a dataframe. Using @datapug solution creates a dataframe. – Kane Chew Jun 07 '23 at 14:49

score 82 · Answer 2 · edited Dec 02 '21 at 23:39

82

This is just an add-on to the solution in case you want to compute not only unique values but other aggregate functions:

df.groupby(['group']).agg(['min', 'max', 'count', 'nunique'])

edited Dec 02 '21 at 23:39

Henry Ecker

34,399
18
41
57

answered Jul 13 '17 at 21:19

datapug

2,261
1
17
33

This solution will create a dataframe. – Kane Chew Jun 07 '23 at 14:50

score 19 · Answer 3 · answered May 24 '21 at 20:12

The above answers work too, but in case you want to add a column with unique_counts to your existing data frame, you can do that using transform

df['distinct_count'] = df.groupby(['param'])['group'].transform('nunique')

output:

   group param  distinct_count
0      1     a             2.0
1      1     a             2.0
2      2     b             1.0
3      3   NaN             NaN
4      3     a             2.0
5      3     a             2.0
6      4   NaN             NaN

and to check the group counts as highted by @jezrael.

print (df.groupby('param')['group'].nunique())

param
a    2
b    1
Name: group, dtype: int64

score 9 · Answer 4 · edited Mar 12 '20 at 22:55

9

I know it has been a while since this was posted, but I think this will help too. I wanted to count unique values and filter the groups by number of these unique values, this is how I did it:

df.groupby('group').agg(['min','max','count','nunique']).reset_index(drop=False)

edited Mar 12 '20 at 22:55

cocool97

1,201
1
10
22

answered Mar 12 '20 at 19:46

nir

109
1
5

score 0 · Answer 5 · answered Jul 04 '22 at 11:31

0

This way is faster and is more convenient:

df.groupby('param').agg({'group':lambda x: len(pd.unique(x))})

answered Jul 04 '22 at 11:31

Dmitry Neklyudov

9
1

Faster than what? Can you show us the code and data that you are using to compare the timings between alternative methods? – Sycorax Dec 07 '22 at 21:49

Count unique values using pandas groupby

5 Answers5

Linked

Related