Is function any() in pandas.groupby short-circuited?

Asked Jul 17 '21 at 21:15

Active Jul 17 '21 at 21:20

Viewed 22 times

While trying to answer another question, I noticed that function any(), when applied within groupby(), performs equally slow regardless of the content of the dataframe. For example, it takes the same time to inspect a column of Trues and a column of Falses. The same is true of all(). This observation contradicts the assumption that any() is short-circuited.

import pandas as pd
import numpy as np
from timeit import timeit

df = pd.DataFrame({'id': np.random.randint(0, 2, 1000000), 'data': True})
timeit('df.groupby("id").any()', globals=globals(), number=100)
# 1.0371657210052945

df['data'] = False
timeit('df.groupby("id").any()', globals=globals(), number=100)
# 1.0135124520165846

Could anyone clarify if the two mentioned functions are short-circuited in Pandas?

edited Jul 17 '21 at 21:20

asked Jul 17 '21 at 21:15

DYZ

55,249
10
64
93

1

`df.groupby("id").agg(np.any)` gives me identical timing to above. So it might be a question about `np.any` as well. – Henry Ecker Jul 17 '21 at 21:17
Perhaps duplicate of [Why “numpy.any” has no short-circuit mechanism?](https://stackoverflow.com/q/45771554/15497888) If it is, that explains why `groupby.max` is so fast in the previous question as well... – Henry Ecker Jul 17 '21 at 21:20

Is function any() in pandas.groupby short-circuited?

0 Answers0