I find it odd that you're getting better performance using Counter
. Here's my test result (n=10000
):
Using Series.mode on Series with nan: 52.41649858
Using Series.mode on Series without nan: 17.186453438
Using Counter on Series with nan: 269.33117825500005
Using Counter on Series without nan: 134.207576572
#-----------------------------------------------------#
Series.mode Counter
----------- -------------
With nan 52.42s 269.33s
Without nan 17.19s 134.21s
Test code:
import timeit
setup = '''
import pandas as pd
from collections import Counter
def get_most_common(srs):
return srs.mode(dropna=False)[0]
def get_most_common_counter(srs):
x = list(srs)
my_counter = Counter(x)
return my_counter.most_common(1)[0][0]
df = pd.read_csv(r'large.data')
'''
print(f"""Using Series.mode on Series with nan: {timeit.timeit('get_most_common(df["has_nan"])', setup=setup, number=10000)}""")
print(f"""Using Series.mode on Series without nan: {timeit.timeit('get_most_common(df["no_nan"])', setup=setup, number=10000)}""")
print(f"""Using Counter on Series with nan: {timeit.timeit('get_most_common_counter(df["has_nan"])', setup=setup, number=10000)}""")
print(f"""Using Counter on Series without nan: {timeit.timeit('get_most_common_counter(df["no_nan"])', setup=setup, number=10000)}""")
large.data
is a 2 x 50000 rows DataFrame
of random 2-digit string from 0
to 99
, where has_nan
has a mode
of nan=551
.
If anything, your if np.nan not in my_counter.keys()
condition will always be triggered, because np.nan
is not in my_counter.keys()
. So in actuality you never used pd.Series.mode
, it was always using Counter
. As mentioned in the other question, because your pandas
object already created copies of np.nan
within the Series/DataFrame
, the in
condition will never be fulfilled. Give it a try:
np.nan in pd.Series([np.nan, 1, 2]).to_list()
# False
Remove the entire complexity of the if/else
and stick with one method. And then compare the performance. As mentioned in your other question, a pandas method would almost always be the better approach over any external modules/methods. If you are still observing otherwise, please update your question.