What is the difference between size and count in pandas?

Question

That is the difference between groupby("x").count and groupby("x").size in pandas ?

Does size just exclude nil ?

Documentation said, that size "return number of elements in the NDFrame", and count "return Series with number of non-NA/null observations over requested axis. Works with non-floating point data as well (detects NaN and None)" — hamsternik, Oct 26 '15 at 13:14
The accepted answer states the difference is including or excluding `NaN` values, it must be noted this is a secondary point. Compare outputs of `df.groupby('key').size()` and of `df.groupby('key').count()` for a DataFrame with multiple Series. The difference is obvious: `count` works like any other aggregate functions (`mean`, `max`...) but `size` is specific to getting the number of index entries in the group, and therefore doesn't look at values in columns which are meaningless for this function. See @cs95 [answer](https://stackoverflow.com/a/54364400/774575) for an accurate explanation. — mins, Dec 28 '20 at 13:59
Note that `df.size` and `series.size` are [not functions](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.size.html#pandas-dataframe-size) so we cannot call them; whereas `GroupBy.size` is a function so we can call: `GroupBy.size()`. — starriet, Apr 17 '23 at 16:25

score 161 · Accepted Answer · answered Oct 26 '15 at 13:13

161

size includes NaN values, count does not:

In [46]:
df = pd.DataFrame({'a':[0,0,1,2,2,2], 'b':[1,2,3,4,np.NaN,4], 'c':np.random.randn(6)})
df

Out[46]:
   a   b         c
0  0   1  1.067627
1  0   2  0.554691
2  1   3  0.458084
3  2   4  0.426635
4  2 NaN -2.238091
5  2   4  1.256943

In [48]:
print(df.groupby(['a'])['b'].count())
print(df.groupby(['a'])['b'].size())

a
0    2
1    1
2    2
Name: b, dtype: int64

a
0    2
1    1
2    3
dtype: int64

answered Oct 26 '15 at 13:13

EdChum

376,765
198
813
562

10

I think that count also returns a DataFrame while size a Series? – Mr_and_Mrs_D May 19 '18 at 12:59
1

.size() function get aggregated value of the particular column only while .column() is used for every column. – Nachiket Mar 04 '19 at 17:08
@Mr_and_Mrs_D size returns an integer – boardtc Nov 19 '19 at 15:39
@boardtc df.size returns a number - the groupby methods are discussed here, see the links in the question. – Mr_and_Mrs_D Nov 20 '19 at 22:04
1

As for my question - count and size indeed return DataFrame and Series respectively when "bound" to a DataFrameGroupBy instance - in the question are bound to SeriesGroupBy so they both return a Series instance – Mr_and_Mrs_D Nov 20 '19 at 22:07

score 63 · Answer 2 · edited Jun 02 '21 at 09:07

What is the difference between size and count in pandas?

The other answers have pointed out the difference, however, it is not completely accurate to say "size counts NaNs while count does not". While size does indeed count NaNs, this is actually a consequence of the fact that size returns the size (or the length) of the object it is called on. Naturally, this also includes rows/values which are NaN.

So, to summarize, size returns the size of the Series/DataFrame¹,

df = pd.DataFrame({'A': ['x', 'y', np.nan, 'z']})
df

     A
0    x
1    y
2  NaN
3    z

<!- _>

df.A.size
# 4

...while count counts the non-NaN values:

df.A.count()
# 3

Notice that size is an attribute (gives the same result as len(df) or len(df.A)). count is a function.

_{1. DataFrame.size is also an attribute and returns the number of elements in the DataFrame (rows x columns).}

Behaviour with `GroupBy` - Output Structure

Besides the basic difference, there's also the difference in the structure of the generated output when calling GroupBy.size() vs GroupBy.count().

df = pd.DataFrame({
    'A': list('aaabbccc'),
    'B': ['x', 'x', np.nan, np.nan,
          np.nan, np.nan, 'x', 'x']
})

df
   A    B
0  a    x
1  a    x
2  a  NaN
3  b  NaN
4  b  NaN
5  c  NaN
6  c    x
7  c    x

Consider,

df.groupby('A').size()

A
a    3
b    2
c    3
dtype: int64

Versus,

df.groupby('A').count()

   B
A   
a  2
b  0
c  2

GroupBy.count returns a DataFrame when you call count on all column, while GroupBy.size returns a Series.

The reason being that size is the same for all columns, so only a single result is returned. Meanwhile, the count is called for each column, as the results would depend on on how many NaNs each column has.

Behavior with `pivot_table`

Another example is how pivot_table treats this data. Suppose we would like to compute the cross tabulation of

df

   A  B
0  0  1
1  0  1
2  1  2
3  0  2
4  0  0

pd.crosstab(df.A, df.B)  # Result we expect, but with `pivot_table`.

B  0  1  2
A         
0  1  2  1
1  0  0  1

With pivot_table, you can issue size:

df.pivot_table(index='A', columns='B', aggfunc='size', fill_value=0)

B  0  1  2
A         
0  1  2  1
1  0  0  1

But count does not work; an empty DataFrame is returned:

df.pivot_table(index='A', columns='B', aggfunc='count')

Empty DataFrame
Columns: []
Index: [0, 1]

I believe the reason for this is that 'count' must be done on the series that is passed to the values argument, and when nothing is passed, pandas decides to make no assumptions.

score 8 · Answer 3 · answered Dec 29 '16 at 20:37

8

Just to add a little bit to @Edchum's answer, even if the data has no NA values, the result of count() is more verbose, using the example before:

grouped = df.groupby('a')
grouped.count()
Out[197]: 
   b  c
a      
0  2  2
1  1  1
2  2  3
grouped.size()
Out[198]: 
a
0    2
1    1
2    3
dtype: int64

answered Dec 29 '16 at 20:37

Bubble Bubble Bubble Gut

3,280
12
28

It seems `size` is elegant equivalent of `count` in pandas. – QM.py Dec 09 '17 at 12:39
@QM.py NO, it isn't. The reason for the difference in the `groupby` output is explained [here](https://stackoverflow.com/a/54364400/4909087). – cs95 Apr 02 '19 at 03:23

score 1 · Answer 4 · edited Aug 01 '18 at 09:50

When we are dealing with normal dataframes then only difference will be an inclusion of NAN values, means count does not include NAN values while counting rows.

But if we are using these functions with the groupby then, to get the correct results by count() we have to associate any numeric field with the groupby to get the exact number of groups where for size() there is no need for this type of association.

score 1 · Answer 5 · edited May 19 '22 at 17:20

In addition to all above answers, I would like to point out one more difference which I find significant.

You can correlate pandas' DataFrame size and count with Java's Vectors size and length. When we create a vector, some predefined memory is allocated to it. When we reach closer to the maximum number of elements it can hold, more memory is allocated to accommodate further additions. Similarly, in DataFrame as we add elements, the memory allocated to it increases.

The size attribute gives the number of memory cell allocated to DataFrame whereas count gives the number of elements that are actually present in DataFrame. For example,

You can see that even though there are 3 rows in DataFrame, its size is 6.

This answer covers size and count difference with respect to DataFrame and not pandas Series. I have not checked what happens with Series.

What is the difference between size and count in pandas?

5 Answers5

What is the difference between size and count in pandas?

Behaviour with `GroupBy` - Output Structure

Behavior with `pivot_table`

Linked

Related

What is the difference between size and count in pandas?

5 Answers5

What is the difference between size and count in pandas?

Behaviour with GroupBy - Output Structure

Behavior with pivot_table

Linked

Related

Behaviour with `GroupBy` - Output Structure

Behavior with `pivot_table`