How can I count the unique values in a Pandas Dataframe?

Question

I have a pandas DataFrame that looks like Y =

       0  1  2  3
0      1  1  0  0
1      0  0  0  0
2      1  1  1  0
3      1  1  0  0
4      1  1  0  0
5      1  1  0  0
6      1  0  0  0
7      1  1  1  0
8      1  0  0  0
...   .. .. .. ..
14989  1  1  1  1
14990  1  1  1  0
14991  1  1  1  1
14992  1  1  1  0

[14993 rows x 4 columns]

There are a total of 5 unique values:

For each unique value, I want to count how many times it's in the Y DataFrame

Erfan · Answer 1 · 2019-03-10T21:06:25.657

We can use .groupby for this to get the unique combinations. While applying groupby, we calculate the size of the aggregation.

# Groupby on all columns which aggregates the data
df_group = df.groupby(list(df.columns)).size().reset_index()

# Because we used reset_index we need to rename our count column
df_group.rename({0:'count'}, inplace=True, axis=1)

Output

   0  1  2  3  count
0  0  0  0  0      1
1  1  0  0  0      2
2  1  1  0  0      4
3  1  1  1  0      4
4  1  1  1  1      2

Note

I copied the example dataframe you provided. Which looks like this:

print(df)
       0  1  2  3
0      1  1  0  0
1      0  0  0  0
2      1  1  1  0
3      1  1  0  0
4      1  1  0  0
5      1  1  0  0
6      1  0  0  0
7      1  1  1  0
8      1  0  0  0
14989  1  1  1  1
14990  1  1  1  0
14991  1  1  1  1
14992  1  1  1  0

BENY · Accepted Answer · 2019-03-10T22:17:44.830

3

Let us using np.unique

c,v=np.unique(df.values,axis=0,return_counts =True)
c
array([[0, 0, 0, 0],
       [1, 0, 0, 0],
       [1, 1, 0, 0],
       [1, 1, 1, 0]], dtype=int64)
v
array([1, 2, 4, 2], dtype=int64)

edited Mar 10 '19 at 22:17

answered Mar 10 '19 at 22:08

BENY

317,841
20
164
234

This works to get the unique rows, but I think the desired result is to count occurrences of each unique row in the original DataFrame. – Peter Leimbigler Mar 10 '19 at 22:11

score 1 · Answer 3 · answered Mar 11 '19 at 01:17

I made sample for you.


    import itertools
    import random
    iter_list  = list(itertools.product([0,1],[0,1],[0,1],[0,1]))
    sum_list = []
    for i in range(1000):
        sum_list.append(random.choice(iter_list))

    target_df = pd.DataFrame(sum_list)
    target_df.reset_index().groupby(list(target_df.columns)).count().rename(columns ={'index':'count'}).reset_index()

How can I count the unique values in a Pandas Dataframe?

3 Answers3