While trying to answer another question, I noticed that function any()
, when applied within groupby()
, performs equally slow regardless of the content of the dataframe. For example, it takes the same time to inspect a column of True
s and a column of False
s. The same is true of all()
. This observation contradicts the assumption that any()
is short-circuited.
import pandas as pd
import numpy as np
from timeit import timeit
df = pd.DataFrame({'id': np.random.randint(0, 2, 1000000), 'data': True})
timeit('df.groupby("id").any()', globals=globals(), number=100)
# 1.0371657210052945
df['data'] = False
timeit('df.groupby("id").any()', globals=globals(), number=100)
# 1.0135124520165846
Could anyone clarify if the two mentioned functions are short-circuited in Pandas?