Python - Turn all items in a Dataframe to strings

Question

I followed the following procedure: In Python, how do I convert all of the items in a list to floats? because each column of my Dataframe is list, but instead of floats I chose to change all the values to strings.

df = [str(i) for i in df]

But this failed.

It simply erased all the data except for the first row of column names.

Then, trying df = [str(i) for i in df.values] resulted in changing the entire Dataframe into one big list, but that messes up the data too much to be able to meet the goal of my script which is to export the Dataframe to my Oracle table.

Is there a way to convert all the items that are in my Dataframe that are NOT strings into strings?

score 98 · Answer 1 · edited Nov 19 '19 at 09:48

98

You can use this:

df = df.astype(str)

out of curiosity I decided to see if there is any difference in efficiency between the accepted solution and mine.

The results are below:

example df:

df = pd.DataFrame([list(range(1000))], index=[0])

test df.astype:

%timeit df.astype(str) 
>> 100 loops, best of 3: 2.18 ms per loop

test df.applymap:

%timeit df.applymap(str)
1 loops, best of 3: 245 ms per loop

It seems df.astype is quite a lot faster :)

edited Nov 19 '19 at 09:48

Cleb

25,102
20
116
151

answered Mar 08 '17 at 16:45

PdevG

3,427
15
30

how would you do this if you wanted to convert a list of dataframes to string? – Joe Rivera Feb 25 '21 at 21:52
2

list_of_dfs = [df.astype(str) for df in list_of_dfs] – PdevG Feb 28 '21 at 17:13
this seems to put all the dataframes into a list of dataframes and although it does convert them to strings it doesn't actually convert the original dfs into strings. I'd have to unpack them and reassign them to their original df names. Is there an easy way to do this? – Joe Rivera Mar 01 '21 at 21:23
this works [df_a, df_b, df_c] = [df.astype(str) for df in [df_a, df_b, df_c]], but this doesn't. list_of_dfs = [df.astype(str) for df in list_of_dfs] – Joe Rivera Mar 01 '21 at 21:27
Ah, didn't really get what you meant. glad you solved it! – PdevG Mar 03 '21 at 08:08
[df.astype(str, copy=False) for df in list_of_dfs] will probably also work if I understand you correctly, but the documentation warns against using this. – PdevG Mar 03 '21 at 08:10

score 66 · Accepted Answer · answered Mar 08 '17 at 16:44

66

You can use applymap method:

df = df.applymap(str)

answered Mar 08 '17 at 16:44

Psidom

209,562
33
339
356

1

That worked absolutely perfect and fixed my entire code. Thanks a ton – theprowler Mar 08 '17 at 16:49
7

I don't know how big your dataframe is, but it seems astype is quite a lot faster. See my answer :). – PdevG Mar 09 '17 at 12:47
1

Be careful using this with nan values, it will turn them into 'nan' strings. – Pysnek313 Dec 09 '20 at 16:02
Both `astype(str)` and `applymap(str)` leave column types as "object" causing problems later. How to convert into string for good? – Jari Turkia Aug 31 '21 at 11:53

Sander van den Oord · Answer 3 · 2020-02-04T15:40:43.840

With pandas >= 1.0 there is now a dedicated string datatype:

You can convert your column to this pandas string datatype using .astype('string'):

df = df.astype('string')

This is different from using str which sets the pandas 'object' datatype:

df = df.astype(str)

You can see the difference in datatypes when you look at the info of the dataframe:

df = pd.DataFrame({
    'zipcode_str': [90210, 90211] ,
    'zipcode_string': [90210, 90211],
})

df['zipcode_str'] = df['zipcode_str'].astype(str)
df['zipcode_string'] = df['zipcode_str'].astype('string')

df.info()

# you can see that the first column has dtype object
# while the second column has the new dtype string
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   zipcode_str     2 non-null      object
 1   zipcode_string  2 non-null      string
dtypes: object(1), string(1)

From the docs:

The 'string' extension type solves several issues with object-dtype NumPy arrays:

1) You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.

2) object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text, but still object-dtype columns.

3) When reading code, the contents of an object dtype array is less clear than string.

Information about pandas 1.0 can be found here:
https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html

This also handles NaN or NA values properly. Which is not the case for str. — Prolix, Jul 06 '22 at 12:56

score 3 · Answer 4 · answered Dec 01 '18 at 16:18

3

This worked for me:

dt.applymap(lambda x: x[0] if type(x) is list else None)

answered Dec 01 '18 at 16:18

Sarbari Roy

31
3

Python - Turn all items in a Dataframe to strings

4 Answers4

Linked