1

I have a pandas DataFrame that looks like Y =

       0  1  2  3
0      1  1  0  0
1      0  0  0  0
2      1  1  1  0
3      1  1  0  0
4      1  1  0  0
5      1  1  0  0
6      1  0  0  0
7      1  1  1  0
8      1  0  0  0
...   .. .. .. ..
14989  1  1  1  1
14990  1  1  1  0
14991  1  1  1  1
14992  1  1  1  0

[14993 rows x 4 columns]

There are a total of 5 unique values:

1  1  0  0
0  0  0  0
1  1  1  0
1  0  0  0
1  1  1  1

For each unique value, I want to count how many times it's in the Y DataFrame

Shamoon
  • 41,293
  • 91
  • 306
  • 570

3 Answers3

3

We can use .groupby for this to get the unique combinations. While applying groupby, we calculate the size of the aggregation.

# Groupby on all columns which aggregates the data
df_group = df.groupby(list(df.columns)).size().reset_index()

# Because we used reset_index we need to rename our count column
df_group.rename({0:'count'}, inplace=True, axis=1)

Output

   0  1  2  3  count
0  0  0  0  0      1
1  1  0  0  0      2
2  1  1  0  0      4
3  1  1  1  0      4
4  1  1  1  1      2

Note

I copied the example dataframe you provided. Which looks like this:

print(df)
       0  1  2  3
0      1  1  0  0
1      0  0  0  0
2      1  1  1  0
3      1  1  0  0
4      1  1  0  0
5      1  1  0  0
6      1  0  0  0
7      1  1  1  0
8      1  0  0  0
14989  1  1  1  1
14990  1  1  1  0
14991  1  1  1  1
14992  1  1  1  0
Erfan
  • 40,971
  • 8
  • 66
  • 78
3

Let us using np.unique

c,v=np.unique(df.values,axis=0,return_counts =True)
c
array([[0, 0, 0, 0],
       [1, 0, 0, 0],
       [1, 1, 0, 0],
       [1, 1, 1, 0]], dtype=int64)
v
array([1, 2, 4, 2], dtype=int64)
BENY
  • 317,841
  • 20
  • 164
  • 234
  • This works to get the unique rows, but I think the desired result is to count occurrences of each unique row in the original DataFrame. – Peter Leimbigler Mar 10 '19 at 22:11
1

I made sample for you.


    import itertools
    import random
    iter_list  = list(itertools.product([0,1],[0,1],[0,1],[0,1]))
    sum_list = []
    for i in range(1000):
        sum_list.append(random.choice(iter_list))

    target_df = pd.DataFrame(sum_list)
    target_df.reset_index().groupby(list(target_df.columns)).count().rename(columns ={'index':'count'}).reset_index()

mtgarden
  • 77
  • 1
  • 5