can not convert column type from object to str in python dataframe

Question

i have downloaded a csv file, and then read it to python dataframe, now all 4 columns all have object type, i want to convert them to str type,

and now the result of dtypes is as follows:

Name                      object
Position Title            object
Department                object
Employee Annual Salary    object
dtype: object

i try to change the type using the following methods:

path['Employee Annual Salary'] = path['Employee Annual Salary'].astype(str)

but dtypes still return type object, and i also try to provide the column type when reading csv,

path = pd.read_csv("C:\\Users\\IBM_ADMIN\\Desktop\\ml-1m\\city-of-chicago-salaries.csv",dtype={'Employee Annual Salary':str})

or

path = pd.read_csv("C:\\Users\\IBM_ADMIN\\Desktop\\ml-1m\\city-of-chicago-salaries.csv",dtype=str)

but still do not work, want to know how to change column type from object to str,

Possible duplicate of http://stackoverflow.com/questions/21018654/strings-in-a-dataframe-but-dtype-is-object — meatballs, Dec 14 '16 at 13:47
that link is helpful for me, then another problem is: how to remove that '$' from column Employee Annual Salary, and then convert that to float type ? — tonyibm, Dec 15 '16 at 01:11
i found the reason why it failed to use replace, the correct way is : path['Employee Annual Salary'] = path['Employee Annual Salary'].str.replace('$',''), i didn't add str in front of replace in the past, — tonyibm, Dec 15 '16 at 01:19

meatballs · Answer 1 · 2016-12-14T14:22:47.093

25

For strings, the column type will always be 'object.' There is no need for you convert anything; it is already doing what you require.

The types come from numpy, which has a set of numeric data types. Anything else is an object.

You might want to read http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.01-Understanding-Data-Types.ipynb for a fuller explanation.

edited Dec 14 '16 at 14:22

answered Dec 14 '16 at 13:56

meatballs

3,917
1
15
19

i try to remove '$' from column Employee Annual Salary, if i use replace directly, it do not work, – tonyibm Dec 15 '16 at 00:43
1

object is actually for str, so no need to convert it to str type, – tonyibm Dec 15 '16 at 01:20
But then there may be an issue when trying to df.join ("ValueError: You are trying to merge on object and int64 columns.") – Mathy Jul 26 '22 at 14:24

score 18 · Answer 2 · edited Nov 05 '20 at 10:13

18

Actually you can set the type of a column to string. Use .astype('string') rather than .astype(str).

Sample Data Set

df = pd.DataFrame(data={'name': ['Bla',None,'Peter']})

The column name is by default a object.

Single Column Solution

df.name = df.name.astype('string')

It's important to write .astype('string') rather than .astype(str) which didn't work for me. It will stay as object as you do so.

Multi-Column Solution

df = df.astype(dtype={'name': 'string'})

Allows to change multiple fields at once.

edited Nov 05 '20 at 10:13

Dharman

30,962
25
85
135

answered Nov 05 '20 at 10:07

Felix

1,097
1
10
16

1

When I use ```.astype('string')```, I get this error -> ```TypeError: data type 'string' not understood``` pandas version -> ```0.25.3``` – kjsr7 Nov 16 '20 at 05:29

score 7 · Answer 3 · answered Mar 10 '21 at 02:52

7

Please use:--

df = df.convert_dtypes()

It will automatically convert to suitable Types. and it whould work.

answered Mar 10 '21 at 02:52

Abhijit

333
4
7

What a nice thing to know... – Mathy Jul 26 '22 at 14:23

score 2 · Answer 4 · answered Dec 14 '16 at 13:56

I think that the astype worked, it's just that you can't see the results of the changes viewing dtypes. For example,

import pandas
data = [{'Name': 'Schmoe, Joe', 'Position Title': 'Dude', 'Department': 'Zip', 'Employee Annual Salary': 200000.00},
        {'Name': 'Schmoe, Jill', 'Position Title': 'Dudette', 'Department': 'Zam', 'Employee Annual Salary': 300000.00},
        {'Name': 'Schmoe, John', 'Position Title': 'The Man', 'Department': 'Piz', 'Employee Annual Salary': 100000.00},
        {'Name': 'Schmoe, Julie', 'Position Title': 'The Woman', 'Department': 'Maz', 'Employee Annual Salary': 150000.00}]
df = pandas.DataFrame.from_records(data, columns=['Name', 'Position Title', 'Department', 'Employee Annual Salary'] )

Now if I do dtypes on df I see:

In [32]: df.dtypes
Out[32]:
Name                       object
Position Title             object
Department                 object
Employee Annual Salary    float64
dtype: object

Now if I do,

In [33]: df.astype(str)['Employee Annual Salary'].map(lambda x:  type(x))
Out[33]:
0    <type 'str'>
1    <type 'str'>
2    <type 'str'>
3    <type 'str'>
Name: Employee Annual Salary, dtype: object

I see that all of my salary values are now floats even though the dtype shows up as a column.

So the bottom line is that I think that you are fine.

the column Employee Annual Salary has '$', i want to remove it, after i use replace, it do not work, — tonyibm, Dec 15 '16 at 00:44
object is actually for str, so no need to convert it to str using astype, — tonyibm, Dec 15 '16 at 01:21

score 0 · Answer 5 · answered Feb 01 '20 at 15:16

I agree with the above mentioned answers. You do not need to convert objects to string. However, if you ever have the need to convert a multitude of columns to another datatype (ex. int) you can use the following code:

object_columns_list = list(df.select_dtypes(include='object').columns)

for object_column in object_columns_list:
    df[object_column] = df[object_column].astype(int)

can not convert column type from object to str in python dataframe

5 Answers5

Linked