169

I use Pandas 'ver 0.12.0' with Python 2.7 and have a dataframe as below:

df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610],
                    'colour': ['black', 'white','white','white',
                            'black', 'black', 'white', 'white'],
                    'shape': ['round', 'triangular', 'triangular','triangular','square',
                                        'triangular','round','triangular']
                    },  columns= ['id','colour', 'shape'])

The id Series consists of some integers and strings. Its dtype by default is object. I want to convert all contents of id to strings. I tried astype(str), which produces the output below.

df['id'].astype(str)
0    1
1    5
2    z
3    1
4    1
5    7
6    2
7    6

1) How can I convert all elements of id to String?

2) I will eventually use id for indexing for dataframes. Would having String indices in a dataframe slow things down, compared to having an integer index?

Zhubarb
  • 11,432
  • 18
  • 75
  • 114
  • 1
    Not sure why you get that output as `astype` works fine for me, at least in version 0.13.1, maybe 0.12.0 has a bug? In answer to your second point, yes it is likely to be slower as string comparison will not be faster than integer comparison but I would profile this first, also it depends on the size – EdChum Mar 06 '14 at 17:41
  • you've set the column, right? df['id'] = df['id'].astype(str) – Andy Hayden Mar 06 '14 at 17:51
  • @Andy Hayden, yes I do the appointment, but it is the output that I thought was unexpected. – Zhubarb Mar 06 '14 at 18:48
  • unexpected in what way? – Andy Hayden Mar 06 '14 at 19:26
  • 1
    It only returns the 1st character of each Series element as I put in the question under `df['id'].astype(str)` – Zhubarb Mar 07 '14 at 08:55
  • For anyone wondering why the accepted answer isn't working in later versions of pandas, I have added a new answer to the question to reflect the current documentation. – rocksNwaves Mar 05 '20 at 20:35
  • @Zhubarb, please change the accepted answer – Fons MA Feb 12 '22 at 01:16

11 Answers11

219

A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str') nor astype(str) work.

As per the documentation, a Series can be converted to the string datatype in the following ways:

df['id'] = df['id'].astype("string")

df['id'] = pandas.Series(df['id'], dtype="string")

df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)
rocksNwaves
  • 5,331
  • 4
  • 38
  • 77
  • 1
    When i try this I get `data type "string" not understood` – thentangler Sep 21 '22 at 16:11
  • First solution works technically, but replaces content. Copies the strings of one row into all others. particular. Second solution doesn't work technically. Third throws error ```TypeError: Expected an instance of StringDtype, but got the class instead. Try instantiating 'dtype'.``` Will look at specifying data type when I load the data. – Simone Mar 10 '23 at 13:52
  • The third solution misses brackets. The following worked ```df['col'] = pandas.Series(df['col'], dtype = pd.StringDtype()) ``` – Simone Mar 10 '23 at 14:30
125

You can convert all elements of id to str using apply

df.id.apply(str)

0        123
1        512
2      zhub1
3    12354.3
4        129
5        753
6        295
7        610

Edit by OP:

I think the issue was related to the Python version (2.7.), this worked:

df['id'].astype(basestring)
0        123
1        512
2      zhub1
3    12354.3
4        129
5        753
6        295
7        610
Name: id, dtype: object
Zhubarb
  • 11,432
  • 18
  • 75
  • 114
Amit
  • 19,780
  • 6
  • 46
  • 54
70

You must assign it, like this:-

df['id']= df['id'].astype(str)
Rishil Antony
  • 725
  • 5
  • 5
8

Personally none of the above worked for me. What did:

new_str = [str(x) for x in old_obj][0]
manesioz
  • 798
  • 1
  • 9
  • 21
  • Ya true the other ones did not change anything. I know those should've worked but something is off and it did not so I guess this can also be a solution – Vengenzz Vicky Sep 15 '22 at 06:44
6

You can use:

df.loc[:,'id'] = df.loc[:, 'id'].astype(str)

This is why they recommend this solution: Pandas doc

TD;LR

To reflect some of the answers:

df['id'] = df['id'].astype("string")

This will break on the given example because it will try to convert to StringArray which can not handle any number in the 'string'.

df['id']= df['id'].astype(str)

For me this solution throw some warning:

> SettingWithCopyWarning:  
> A value is trying to be set on a copy of a
> slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
user3423349
  • 275
  • 2
  • 7
5

There are two possibilities:

Rafael Ortega
  • 81
  • 1
  • 4
3

For me it worked:

 df['id'].convert_dtypes()

see the documentation here:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html

drGabriel
  • 548
  • 6
  • 5
2

use pandas string methods ie df['id'].str.cat()

Jimmy Obonyo Abor
  • 7,335
  • 10
  • 43
  • 71
2

If you want to do dynamically

df_obj = df.select_dtypes(include='object')
df[df_obj.columns] = df_obj.astype(str)
Ozzy Black
  • 53
  • 6
1

Your problem can easily be solved by converting it to the object first. After it is converted to object, just use "astype" to convert it to str.

obj = lambda x:x[1:]
df['id']=df['id'].apply(obj).astype('str')
ggorlen
  • 44,755
  • 7
  • 76
  • 106
shekhar chander
  • 600
  • 8
  • 14
-2

for me .to_string() worked

df['id']=df['id'].to_string()