6

As untitled, I noticed that pandas 'to_csv' transforms automatically columns where there are only alphanumerical strings to float . I am creating a dataframe in Jupyter notebook and creating a column ['A'] full of values '1'. Hence, I have a dataframe composed of a column of string '1'. When i convert my dataframe to csv file with 'to_csv'. the output csv file is a one column full of integers 1.
You may advise me to reconvert the column to string when reloaded in jupyter, However that's won't work because I don't know beforehand what columns may be penalized because of this behaviour. Is there a way to avoid this strange situation.

jpp
  • 159,742
  • 34
  • 281
  • 339
mouni93
  • 159
  • 1
  • 2
  • 13
  • Do you have `NaN` values? – Josh Friedlander Jan 16 '19 at 12:36
  • Can the DataFrame contain columns where you do want numerical values (mixed types in your csv output) or are you trying to have every column output as a string? – John Jan 16 '19 at 12:39
  • @JoshFriedlander: before saving my dataframe to csv: i transformed all NANs to ''. However to_csv reintroduces NANs in the csv, but that's not an issue because when I read the file, i can simply do fillna('',inplace=True). – mouni93 Jan 16 '19 at 12:41
  • @John: I have many types columns: int and str. I noticed that some of my str columns are transformed to int ( in these columns there are only alphanumeraical str ) – mouni93 Jan 16 '19 at 12:44
  • 1
    You call it 'strange beharviour', but a csv file doesn't have a difference between strings, integers, booleans, etc. It's just a text file. Loading it is where the difference comes from. If you export a column with strings `'Text'`, it will also be stored as `text` instead of `'text'`, just as `'1'` is stored as `1` (just as the int 1)... – Niels Henkens Jan 16 '19 at 12:51
  • I assume if you are using csv is because you are producing an Excel later on. If that is the case, I found a reference in [this post](https://stackoverflow.com/questions/165042/stop-excel-from-automatically-converting-certain-text-values-to-dates). You need your output to be of the form "=1" – Tarifazo Jan 16 '19 at 12:58

2 Answers2

6

You can set the quoting parameter in to_csv, take a look at this example:

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)
df.to_csv('test.csv', sep='\t', quoting=csv.QUOTE_NONNUMERIC)

The created csv file is:

""  0   1   2
0   "a" "1.2"   "4.2"
1   "b" "70"    "0.03"
2   "x" "5" "0"

You can also set the quote character with quotechar parameter, e.g. quotechar="'" will produce this output:

''  0   1   2
0   'a' '1.2'   '4.2'
1   'b' '70'    '0.03'
2   'x' '5' '0'
farkas
  • 307
  • 2
  • 8
  • 1
    Does this work when all the values in a column are `"1"`? It doesn't work for me.. – jpp Jan 16 '19 at 12:56
  • You may want "QUOTE_ALL" which is `1`. (See [the docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).) – Pro Q Apr 27 '22 at 19:08
4

One way is to store your types separately and load this with your data:

df = pd.DataFrame({0: ['1', '1', '1'],
                   1: [2, 3, 4]})

df.dtypes.to_frame('types').to_csv('types.csv')
df.to_csv('file.csv', index=False)

df_types = pd.read_csv('types.csv')['types']
df = pd.read_csv('file.csv', dtype=df_types.to_dict())

print(df.dtypes)
# 0    object
# 1     int64
# dtype: object

You may wish to consider Pickle to ensure your dataframe is guaranteed to be unchanged:

df.to_pickle('file.pkl')
df = pd.read_pickle('file.pkl')

print(df.dtypes)
# 0    object
# 1     int64
# dtype: object
jpp
  • 159,742
  • 34
  • 281
  • 339
  • The idea is good but with some adjustments. When I save the dtype of the columns, the objective column is saved as an 'object' type. So when i reload my dataframe with this dtype, unfortunuatly it doesn't solve my isssue. So I found a way to open the concenred columns as str instead of just an 'object' type – mouni93 Jan 16 '19 at 13:11
  • @mouni93, There is no such thing as `str` dtype in Pandas, strings are stored in `object` dtype columns. Check `type(df['0'].iat[0])` when you reload the dataframe to prove to yourself they are strings, even though `df['0']` dtype gives `object`. – jpp Jan 16 '19 at 13:12