how to count occurrence of each unique value in pandas

Question

I have large pandas dataframe, I would like to count the occurrence of each unique value in it, I try following but it takes to much time and memory usage. How can I do it in a pythonic way?

pack=[]
for index,row in packets.iterrows ():
    pack.extend(pd.Series(row).dropna().values.tolist())

unique, count= np.unique(pack, return_counts=True)
counts= np.asarray((unique, count))

Also, why are you creating the `pack` list and not doing anything with it? — cs95, Dec 28 '17 at 19:01
AttributeError: 'DataFrame' object has no attribute 'value_counts' — user3806649, Dec 28 '17 at 19:01
Uh, I thought it was a series. Do you want to find the count of every single value in every column together? Just do `u,c = np.unique(packets.values.ravel(), return_counts=True)` — cs95, Dec 28 '17 at 19:02
The result is `(array([ 58., 59., 62., ..., nan, nan, nan]), array([120, 3, 5, ..., 1, 1, 1], dtype=int64))` which is not the count of each unique value in the whole dataframe — user3806649, Dec 28 '17 at 19:06
`pd.Series(packets.values.ravel()).dropna().value_counts()` then? — cs95, Dec 28 '17 at 19:08
Please convert your comment to answer in order to accept it. — user3806649, Dec 28 '17 at 19:10

score 6 · Accepted Answer · answered Dec 28 '17 at 19:16

It seems like you want to compute value counts across all columns. You can flatten it to a series, drop NaNs, and call value_counts. Here's a sample -

df

     a    b
0  1.0  NaN
1  1.0  NaN
2  3.0  3.0
3  NaN  4.0
4  5.0  NaN
5  NaN  4.0
6  NaN  5.0

pd.Series(df.values.ravel()).dropna().value_counts()

5.0    2
4.0    2
3.0    2
1.0    2
dtype: int64

Another method is with np.unique -

u, c = np.unique(pd.Series(df.values.ravel()).dropna().values, return_counts=True)
pd.Series(c, index=u)

1.0    2
3.0    2
4.0    2
5.0    2
dtype: int64

Note that the first method sorts results in descending order of counts, while the latter does not.

how to count occurrence of each unique value in pandas

1 Answers1

Linked