3

As far as I know numpys ndarrays element must be of same type and pandas series uses ndarray to hold values. But it seems like I am able to append an integer to a Series that holds string.

Sample code I have..

import pandas as pd

sr = pd.Series(['foo'])
sr = sr.append(pd.Series([1], index=[1]))
print(type(sr.values))
print(sr.values.dtype)
print(type(sr.iloc[0]))
print(type(sr.iloc[1]))

and the output:

<class 'numpy.ndarray'>
object
<class 'str'>
<class 'int'>

If the ndarrays type is object, how come int is returned for the item at index loc 1?

Koray Tugay
  • 22,894
  • 45
  • 188
  • 319
  • In an object dtype array, the elements retain their Python type, str, int, etc. As with a list, such an array contains pointers to objects stored elsewhere in memory. Don't confuse `dtype` and `type`. – hpaulj Sep 01 '18 at 16:04

1 Answers1

6

An object dtype series consists of pointers to arbitrary Python objects. Think of object dtype in the same way as you might a Python list. For example, the Python list ['foo', 1] does not store values in a contiguous memory block.

In the same way you can't attach a specific data type to list, even if all elements are of the same type, a Pandas object series contains pointers to any number of types.

In general, Pandas dtype changes to accommodate values. So adding a float value to an integer series will turn the whole series to float. Adding a string to a numeric series will force the series to object. You can even force a numeric series to have object dtype, though this is not recommended:

s = pd.Series(list(range(100000)), dtype=object)

The main benefit of Pandas, i.e. vectorised computations, is lost as soon as you start using object series. These should be avoided where possible. You can, for example, use pd.Categorical to factorise categories if applicable.

Here's a trivial example demonstrating the performance drop:

t = pd.Series(list(range(100000)))

%timeit s*10  # 7.31 ms
%timeit t*10  # 366 µs

Related: Strings in a DataFrame, but dtype is object

jpp
  • 159,742
  • 34
  • 281
  • 339
  • `In general, Pandas dtype changes to accommodate values. ` You mean pandas changes dtype of the ndarray? – Koray Tugay Sep 01 '18 at 16:32
  • @KorayTugay, Yes. Or creates a new array with another dtype, and attaches this to a dataframe column/series label. – jpp Sep 01 '18 at 16:32