pandas to_csv converts str column to int(or float)

Question

As untitled, I noticed that pandas 'to_csv' transforms automatically columns where there are only alphanumerical strings to float . I am creating a dataframe in Jupyter notebook and creating a column ['A'] full of values '1'. Hence, I have a dataframe composed of a column of string '1'. When i convert my dataframe to csv file with 'to_csv'. the output csv file is a one column full of integers 1.
You may advise me to reconvert the column to string when reloaded in jupyter, However that's won't work because I don't know beforehand what columns may be penalized because of this behaviour. Is there a way to avoid this strange situation.

Can the DataFrame contain columns where you do want numerical values (mixed types in your csv output) or are you trying to have every column output as a string? — John, Jan 16 '19 at 12:39
@JoshFriedlander: before saving my dataframe to csv: i transformed all NANs to ''. However to_csv reintroduces NANs in the csv, but that's not an issue because when I read the file, i can simply do fillna('',inplace=True). — mouni93, Jan 16 '19 at 12:41
@John: I have many types columns: int and str. I noticed that some of my str columns are transformed to int ( in these columns there are only alphanumeraical str ) — mouni93, Jan 16 '19 at 12:44
You call it 'strange beharviour', but a csv file doesn't have a difference between strings, integers, booleans, etc. It's just a text file. Loading it is where the difference comes from. If you export a column with strings `'Text'`, it will also be stored as `text` instead of `'text'`, just as `'1'` is stored as `1` (just as the int 1)... — Niels Henkens, Jan 16 '19 at 12:51
I assume if you are using csv is because you are producing an Excel later on. If that is the case, I found a reference in [this post](https://stackoverflow.com/questions/165042/stop-excel-from-automatically-converting-certain-text-values-to-dates). You need your output to be of the form "=1" — Tarifazo, Jan 16 '19 at 12:58

farkas · Answer 1 · 2019-01-16T12:58:46.243

6

You can set the quoting parameter in to_csv, take a look at this example:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)
df.to_csv('test.csv', sep='\t', quoting=csv.QUOTE_NONNUMERIC)

The created csv file is:

""  0   1   2
0   "a" "1.2"   "4.2"
1   "b" "70"    "0.03"
2   "x" "5" "0"

You can also set the quote character with quotechar parameter, e.g. quotechar="'" will produce this output:

''  0   1   2
0   'a' '1.2'   '4.2'
1   'b' '70'    '0.03'
2   'x' '5' '0'

edited Jan 16 '19 at 12:58

answered Jan 16 '19 at 12:55

farkas

307
2
8

1

Does this work when all the values in a column are `"1"`? It doesn't work for me.. – jpp Jan 16 '19 at 12:56
You may want "QUOTE_ALL" which is `1`. (See [the docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).) – Pro Q Apr 27 '22 at 19:08

score 4 · Accepted Answer · answered Jan 16 '19 at 12:52

4

One way is to store your types separately and load this with your data:

df = pd.DataFrame({0: ['1', '1', '1'],
                   1: [2, 3, 4]})

df.dtypes.to_frame('types').to_csv('types.csv')
df.to_csv('file.csv', index=False)

df_types = pd.read_csv('types.csv')['types']
df = pd.read_csv('file.csv', dtype=df_types.to_dict())

print(df.dtypes)
# 0    object
# 1     int64
# dtype: object

You may wish to consider Pickle to ensure your dataframe is guaranteed to be unchanged:

df.to_pickle('file.pkl')
df = pd.read_pickle('file.pkl')

print(df.dtypes)
# 0    object
# 1     int64
# dtype: object

answered Jan 16 '19 at 12:52

jpp

159,742
34
281
339

The idea is good but with some adjustments. When I save the dtype of the columns, the objective column is saved as an 'object' type. So when i reload my dataframe with this dtype, unfortunuatly it doesn't solve my isssue. So I found a way to open the concenred columns as str instead of just an 'object' type – mouni93 Jan 16 '19 at 13:11
@mouni93, There is no such thing as `str` dtype in Pandas, strings are stored in `object` dtype columns. Check `type(df['0'].iat[0])` when you reload the dataframe to prove to yourself they are strings, even though `df['0']` dtype gives `object`. – jpp Jan 16 '19 at 13:12

pandas to_csv converts str column to int(or float)

2 Answers2

Linked