17

i have downloaded a csv file, and then read it to python dataframe, now all 4 columns all have object type, i want to convert them to str type,

enter image description here

and now the result of dtypes is as follows:

Name                      object
Position Title            object
Department                object
Employee Annual Salary    object
dtype: object

i try to change the type using the following methods:

path['Employee Annual Salary'] = path['Employee Annual Salary'].astype(str)

but dtypes still return type object, and i also try to provide the column type when reading csv,

path = pd.read_csv("C:\\Users\\IBM_ADMIN\\Desktop\\ml-1m\\city-of-chicago-salaries.csv",dtype={'Employee Annual Salary':str})

or

path = pd.read_csv("C:\\Users\\IBM_ADMIN\\Desktop\\ml-1m\\city-of-chicago-salaries.csv",dtype=str)

but still do not work, want to know how to change column type from object to str,

tonyibm
  • 581
  • 2
  • 8
  • 24
  • Possible duplicate of http://stackoverflow.com/questions/21018654/strings-in-a-dataframe-but-dtype-is-object – meatballs Dec 14 '16 at 13:47
  • that link is helpful for me, then another problem is: how to remove that '$' from column Employee Annual Salary, and then convert that to float type ? – tonyibm Dec 15 '16 at 01:11
  • i found the reason why it failed to use replace, the correct way is : path['Employee Annual Salary'] = path['Employee Annual Salary'].str.replace('$',''), i didn't add str in front of replace in the past, – tonyibm Dec 15 '16 at 01:19

5 Answers5

25

For strings, the column type will always be 'object.' There is no need for you convert anything; it is already doing what you require.

The types come from numpy, which has a set of numeric data types. Anything else is an object.

You might want to read http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.01-Understanding-Data-Types.ipynb for a fuller explanation.

meatballs
  • 3,917
  • 1
  • 15
  • 19
  • i try to remove '$' from column Employee Annual Salary, if i use replace directly, it do not work, – tonyibm Dec 15 '16 at 00:43
  • 1
    object is actually for str, so no need to convert it to str type, – tonyibm Dec 15 '16 at 01:20
  • But then there may be an issue when trying to df.join ("ValueError: You are trying to merge on object and int64 columns.") – Mathy Jul 26 '22 at 14:24
18

Actually you can set the type of a column to string. Use .astype('string') rather than .astype(str).

Sample Data Set

df = pd.DataFrame(data={'name': ['Bla',None,'Peter']})

The column name is by default a object.

Single Column Solution

df.name = df.name.astype('string')

It's important to write .astype('string') rather than .astype(str) which didn't work for me. It will stay as object as you do so.

Multi-Column Solution

df = df.astype(dtype={'name': 'string'})

Allows to change multiple fields at once.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Felix
  • 1,097
  • 1
  • 10
  • 16
  • 1
    When I use ```.astype('string')```, I get this error -> ```TypeError: data type 'string' not understood``` pandas version -> ```0.25.3``` – kjsr7 Nov 16 '20 at 05:29
7

Please use:--

df = df.convert_dtypes()

It will automatically convert to suitable Types. and it whould work.

Abhijit
  • 333
  • 4
  • 7
2

I think that the astype worked, it's just that you can't see the results of the changes viewing dtypes. For example,

import pandas
data = [{'Name': 'Schmoe, Joe', 'Position Title': 'Dude', 'Department': 'Zip', 'Employee Annual Salary': 200000.00},
        {'Name': 'Schmoe, Jill', 'Position Title': 'Dudette', 'Department': 'Zam', 'Employee Annual Salary': 300000.00},
        {'Name': 'Schmoe, John', 'Position Title': 'The Man', 'Department': 'Piz', 'Employee Annual Salary': 100000.00},
        {'Name': 'Schmoe, Julie', 'Position Title': 'The Woman', 'Department': 'Maz', 'Employee Annual Salary': 150000.00}]
df = pandas.DataFrame.from_records(data, columns=['Name', 'Position Title', 'Department', 'Employee Annual Salary'] )

Now if I do dtypes on df I see:

In [32]: df.dtypes
Out[32]:
Name                       object
Position Title             object
Department                 object
Employee Annual Salary    float64
dtype: object

Now if I do,

In [33]: df.astype(str)['Employee Annual Salary'].map(lambda x:  type(x))
Out[33]:
0    <type 'str'>
1    <type 'str'>
2    <type 'str'>
3    <type 'str'>
Name: Employee Annual Salary, dtype: object

I see that all of my salary values are now floats even though the dtype shows up as a column.

So the bottom line is that I think that you are fine.

aquil.abdullah
  • 3,059
  • 3
  • 21
  • 40
0

I agree with the above mentioned answers. You do not need to convert objects to string. However, if you ever have the need to convert a multitude of columns to another datatype (ex. int) you can use the following code:

object_columns_list = list(df.select_dtypes(include='object').columns)

for object_column in object_columns_list:
    df[object_column] = df[object_column].astype(int)
DataBach
  • 1,330
  • 2
  • 16
  • 31