0

In my daily work with Pandas, I often have to set the type of IDs as 'object'. To best illustrate the problem I write down the simple yet puzzling code:

a = pd.DataFrame({'A':[12,32,34,54,65],'B':[122,32,234,54,65],'C':[12,323,34,544,653]},dtype='object')

If I check the types of the columns:

In: a.dtypes

I get as expected

Out: A    object
     B    object
     C    object
     dtype: object

However, the type of a single element is surprising to me:

In: type(a.A.values[0])

Out: int

This is problematic if I try to merge two DataFrames: If the key is not of the same type they will not match (123456 does not match with '123456').

After some work I get the DataFrame to behave in the way I would have expected (for more details, look here). This is done by doing:

b = pd.DataFrame({'A':[12,32,34,54,65],'B':[122,32,234,54,65],'C':[12,323,34,544,653]}).astype(str)

Why does the statement "dtype='object'" is not enough to get string elements. Am I missing something?

  • @jpp I do not agree with you about the fact that my question is a duplicate. I already knew both pages you suggested. But I am here asking why Pandas behaves like this and not how to convert the columns. My last line of code gave already the answer about how to convert a column from integer to string – Flavio Zamponi Dec 06 '18 at 13:23
  • There are 2 duplicates marked. The first answers "why", the second answers "how to resolve". Look at the chart [in the first question](https://stackoverflow.com/a/21020411/9209546). Nobody in 5 years has come up with a clearer answer, I don't think they'll be able to today. – jpp Dec 06 '18 at 13:24
  • @jpp: may I ask you to kindly remove the second link (the one concerning the solution to the problem): as I already pointed out, I put already the solution to my problem into the question. If you think that this can help some readers I can edit my question to include your link. – Flavio Zamponi Dec 11 '18 at 19:42
  • @jpp: Concerning your comment about the first link: look at the answer of Saket Kumar Singh: concise, well-documented and direct to the point. I would say that in 5 years we did get a clearer answer, at least for me, physicist without a PhD in CS – Flavio Zamponi Dec 11 '18 at 19:50
  • Yes, great answer, but here on SO we try and collect answers to the same question so that users can *see them side-by-side*. Ideally, he should have posted it on the duplicate post. – jpp Dec 11 '18 at 19:53
  • 'object' is short for "an arbitrary Python type (different than numpy-pandas specific types)". This might be a string, this might be an int, a decimal or some other funny class you defined yourself. When you use `dtype=object` (or `.astype(object)`), pandas either preserves the Python type (a Python int, in your example) or converts to a Python type (from np.int64 to Python int for example). Mixed dtypes are not actually an issue here because all your elements are integers. – ayhan Dec 11 '18 at 20:37

2 Answers2

0

According to Pandas' documentation:

If a pandas object contains data with multiple dtypes in a single column, the dtype of the column will be chosen to accommodate all of the data types (object is the most general)

Since object accommodates all data types, it wouldn't be coercing your int values to str

Also

The values attribute on a DataFrame return the lower-common-denominator of the dtypes, meaning the dtype that can accommodate ALL of the types in the resulting homogeneous dtyped NumPy array

Further proof that object dtype is the lower-common-denominator and was hence returned and also could not enforce your int to str

0

values() is an inbuilt method in Python programming language that returns a list of all the values available in a given dictionary.

Returns:

returns a list of all the values available in a given dictionary.

In your case, Values() returned some values, for that value you are using Type function,

a = pd.DataFrame({'A':"dd",'B':[122,32,234,54,65],'C':[12,323,34,544,653]},dtype='object')



c = a.A.values[0]



type(c)

Ouput : str

Gerardo Zinno
  • 1,518
  • 1
  • 13
  • 35
Amar
  • 90
  • 8