4

I have a csv file, which has got many columns. One column contains data in the form of dict objects as well as strings.

For eg: Column contains data like : {"a":5,"b":6,"c":8},"usa","india",{"a":9,"b":10,"c":11}

When I read this csv into a dataframe using :

df = pd.read_csv(path)

this column data is recognised as string when i did df.applymap(type) to check the type of each element stored in this particular column.

But data does not have quotes around it neither in csv nor in the dataframe. But still dict objects are converted to string and stored in dataframe.

On checking type of column, it turns out to be object.

Please suggest how to read from csv into dataframe such that dict objects are recognised as dict and strings as strings in this particular column.

Nikita Gupta
  • 495
  • 9
  • 24
  • 1
    That's how pandas represents complex data types. – cs95 Oct 21 '17 at 00:14
  • @cᴏʟᴅsᴘᴇᴇᴅ This came from an earlier question, where the entries happened to be strings rather than dicts. I guess the subtle difference that could have highlighted that was "a" rather than 'a'! – Andy Hayden Oct 21 '17 at 00:20
  • 1
    @AndyHayden Yes, I just saw that. Thank you for the enriching answers, I've learned a lot from them. – cs95 Oct 21 '17 at 00:22

1 Answers1

8

You can convert the strings that should be dicts (or other types) using literal_eval:

from ast import literal_eval

def try_literal_eval(s):
    try:
        return literal_eval(s)
    except ValueError:
        return s

Now you can apply this to your DataFrame:

In [11]: df = pd.DataFrame({'A': ["hello","world",'{"a":5,"b":6,"c":8}',"usa","india",'{"d":9,"e":10,"f":11}']})

In [12]: df.loc[2, "A"]
Out[12]: '{"a":5,"b":6,"c":8}'

In [13]: df
Out[13]:
                       A
0                  hello
1                  world
2    {"a":5,"b":6,"c":8}
3                    usa
4                  india
5  {"d":9,"e":10,"f":11}


In [14]: df.applymap(try_literal_eval)
Out[14]:
                            A
0                       hello
1                       world
2    {'a': 5, 'b': 6, 'c': 8}
3                         usa
4                       india
5  {'d': 9, 'e': 10, 'f': 11}

In [15]: df.applymap(try_literal_eval).loc[2, "A"]
Out[15]: {'a': 5, 'b': 6, 'c': 8}

Note: This is pretty expensive (time-wise) as far as other calls go, however when you're dealing with dictionaries in DataFrames/Series you're necessarily defaulting back to python objects so things are going to be relatively slow... It's probably a good idea to denormalize i.e. get the data back as columns e.g. using json_normalize.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • I guess the json_normalize advice is somewhat unnecessary since you are coming from this question https://stackoverflow.com/a/46856679/1240268 – Andy Hayden Oct 21 '17 at 00:22
  • i need to remove all the string rows and convert dict objects to columns. So, for this conversion in need to make use of json_normalize. As said, I have many columns in dataframe but I want to do apply map thing on a particular column, how do i do that 'df.applymap(try_literal_eval)' in this code. like specifying my column name – Nikita Gupta Oct 21 '17 at 00:25
  • 1
    @NikitaGupta e.g `df.A.apply(try_literal_eval)` – Andy Hayden Oct 21 '17 at 00:27