I am trying to export a Pandas dataframe to Excel where all columns are of text format. By default, the pandas.to_excel() function lets Excel decide the data type. Exporting a column with [1,2,'w'] results in the cells containing 1 and 2 to be numeric, and the cell containing 'w' to be text. I'd like all rows in the column to be text (i.e. ['1','2','w']).
I was able to solve the problem by assigning the column I need to be text using the .astype(str). However, if the data is large, I am concerned that I will run into performance issues. If I understand correctly, df[col] = df[col].astype(str) makes a copy of the data, which is not efficient.
import pandas as pd
df = pd.DataFrame({'a':[1,2,'w'], 'b':['x','y','z']})
df['a'] = df['a'].astype(str)
df.to_excel(r'c:\tmp\test.xlsx')
Is there a more efficient way to do this?
I searched SO several times and didn't see anything on this. Forgive me if this has been answered before. This is my first post, and I'm really happy to participate in this cool forum.
Edit: Thanks to the comments I've received, I see that Converting a series of ints to strings - Why is apply much faster than astype? gives me other options to astype(str). This is really useful. I also wanted to know if astype(str) was inefficient because it made a copy of the data, which I now see that it does not.