38

I have a pandas data frame with different data types. I want to convert more than one column in the data frame to string type. I have individually done for each column but want to know if there is an efficient way?

So at present I am doing something like this:

repair['SCENARIO']=repair['SCENARIO'].astype(str)

repair['SERVICE_TYPE']= repair['SERVICE_TYPE'].astype(str)

I want a function that would help me pass multiple columns and convert them to strings.

jpp
  • 159,742
  • 34
  • 281
  • 339
Sayonti
  • 946
  • 2
  • 8
  • 14

3 Answers3

70

To convert multiple columns to string, include a list of columns to your above-mentioned command:

df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)
# add as many column names as you like.

That means that one way to convert all columns is to construct the list of columns like this:

all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)

Note that the latter can also be done directly (see comments).

sudonym
  • 3,788
  • 4
  • 36
  • 61
  • 10
    For all columns, how about `df = df.astype(str)` ? – jpp Jun 13 '18 at 23:20
  • Yes, also works, absolutely - I just posted this solution to stick with the concept of lists – sudonym Jun 13 '18 at 23:21
  • 1
    Thanks sudonym... I was actually looking for something like a function that would take columns in a data frame and convert them to string. I should be able to change the column names as required though the first solution works perfectly and I did implement it. – Sayonti Jun 14 '18 at 00:08
  • Is there any performance difference between the two? I tried `df = df.astype(str)` shape (50000, 23000) and it crashed (in interactive mode). Thank you – Long Aug 02 '19 at 02:39
  • Wondering why this doesn't works if the list of columns has a single element... – Gian Arauz Dec 14 '21 at 10:55
  • This method worked for me. However, I previously tried `df.iloc[:,9:].astype(str)` to select a slice range but `df.info()` said they were still int64. In addition, if I assigned that call to my original df e.g. df=`df.iloc[:,9:].astype(str)`, the new df would only have those 9 columns. Not sure why only this answer worked for me. – Edison Jun 30 '22 at 03:45
20

I know this is an old question, but I was looking for a way to turn all columns with an object dtype to strings as a workaround for a bug I discovered in rpy2. I'm working with large dataframes, so didn't want to list each column explicitly. This seemed to work well for me so I thought I'd share in case it helps someone else.

stringcols = df.select_dtypes(include='object').columns
df[stringcols] = df[stringcols].fillna('').astype(str)

The "fillna('')" prevents NaN entries from getting converted to the string 'nan' by replacing with an empty string instead.

Joe
  • 425
  • 3
  • 7
0

You can also use list comprehension:

df = [df[col_name].astype(str) for col_name in df.columns]

You can also insert a condition to test if the columns should be converted - for example:

df = [df[col_name].astype(str) for col_name in df.columns if 'to_str' in col_name]
mellifluous
  • 173
  • 2
  • 8
Amir F
  • 2,431
  • 18
  • 12