An object
dtype series consists of pointers to arbitrary Python objects. Think of object
dtype in the same way as you might a Python list. For example, the Python list ['foo', 1]
does not store values in a contiguous memory block.
In the same way you can't attach a specific data type to list
, even if all elements are of the same type, a Pandas object
series contains pointers to any number of types.
In general, Pandas dtype changes to accommodate values. So adding a float value to an integer series will turn the whole series to float
. Adding a string to a numeric series will force the series to object
. You can even force a numeric series to have object
dtype, though this is not recommended:
s = pd.Series(list(range(100000)), dtype=object)
The main benefit of Pandas, i.e. vectorised computations, is lost as soon as you start using object
series. These should be avoided where possible. You can, for example, use pd.Categorical
to factorise categories if applicable.
Here's a trivial example demonstrating the performance drop:
t = pd.Series(list(range(100000)))
%timeit s*10 # 7.31 ms
%timeit t*10 # 366 µs
Related: Strings in a DataFrame, but dtype is object