0

I have a column in a Pandas dataframe that contains dictionaries with Python native types in it (like strings, integers, floats etc.).

After saving this dataframe in a csv file and reloading it, the values are loaded as strings and it seems to be impossible to cast them back to dictionaries. The Python string representation of a dictionary is not a JSON (single quotes instead of double quotes and boolean starts with a capital letter instead of a lower case letter) so it is not possible to use json.loads()

Consider this example :

import json
import pandas as pd

df = pd.DataFrame({'dict_column' : [{'key' : 'val'}]})
df.iloc[0, 0]['key'] --> returns 'val'

df.to_csv('file.csv', index=False)

df_loaded = pd.read_csv('file.csv')
df_loaded.iloc[0, 0]['key'] --> returns an error
json.loads(df_loaded.iloc[0, 0]) --> returns an error

Because of this, it seems very important to apply json.dumps on a dict column before saving a dataframe in a csv. But how to do for existing CSVs ? Is there anyway to cast a string representation of a dict back to a dict ? I have a huge dataframe saved like that and I don't want to lose the data in it.

I hope that there is a better solution than replacing the single quotes to double quotes in the CSV file directly.

Anatole
  • 1
  • 2

1 Answers1

0

Try:

df_loaded["dict_column"] = df_loaded["dict_column"].apply(eval)
df_loaded.iloc[0, 0]['key'] # Now returns 'val'
butterflyknife
  • 1,438
  • 8
  • 17