Consider this dictionary of pandas series. The index on all series are integers and have some potential overlap, but certainly do not coincide. I made an observation that pd.concat
seems slow when combining things along axis=1
when I have large indices, lots of non-overlap, and many items to concatenate. It prompted me to leave axis=0
and subsequently unstack()
. I end up with the same exact result. But unstacking is quicker.
Does anyone have a handle on why this would be the case?
I get that concatenating series on top of one another should be quick, but I would have guessed that the unstacking processes would have been near identical as pd.concat(axis=1)
.
dict_of_series = {
's%s' % i: pd.Series(
1, np.unique(np.random.randint(1000, 10000, size=1000))
) for i in range(100)
}
%%timeit
pd.concat(dict_of_series, axis=0).unstack(0)
10 loops, best of 3: 29.6 ms per loop
%%timeit
pd.concat(dict_of_series, axis=1)
10 loops, best of 3: 43.1 ms per loop