How to properly read JSON from CSV

Question

I have a dataframe with free form addresses which I normalize with Google Maps Geocoder API. The result is returned in JSON, written in a column and saved to csv.

However, when I read this csv back slashes appear in a column with a JSON, because the column was read from csv as string, single quotes appeared at the beginning and the end, and single quotes inside the json were escaped with back slashes.

So I get

json_address
'{\'status\': \'OK\'}'

While I need

json_address
{'status': 'OK'}

The solution here does not work for me, as my csv is read with single quotes and I get the error Expecting property name enclosed in double quotes

In addition to this I do not want to normalize JSON and put it in separate columns, I want the output as a dictionary specified above

I tried to replace single quotes with double quotes

df['json_double'] = df['json_address'].apply(lambda x: x.replace("\'", "\""))

It worked so I got

''{\''status\'': \'OK\''}''

However, when I tryed df['json'] = df['json_double'].apply(json.loads) I got the error Expecting ',' delimiter: line 1 column 609 (char 608)

So my questions are

How to handle JSON properly in future to avoid such problems?
How to solve my problem? I am interested both with the solution on the reading stage and on the stage of transforming string column to json.

PS the example of the whole string I want to have as json is

'{\'results\': [{\'address_components\': [{\'long_name\': \'4\', \'short_name\': \'4\', \'types\': [\'street_number\']}, {\'long_name\': \'Ulitsa Marshala Rokossovskogo\', \'short_name\': \'Ulitsa Marshala Rokossovskogo\', \'types\': [\'route\']}, {\'long_name\': \'Nizhnij Novgorod\', \'short_name\': \'Nizhnij Novgorod\', \'types\': [\'locality\', \'political\']}, {\'long_name\': \'Sovetskiy\', \'short_name\': \'Sovetskiy\', \'types\': [\'administrative_area_level_3\', \'political\']}, {\'long_name\': \'Gorod Nizhniy Novgorod\', \'short_name\': \'Gorod Nizhniy Novgorod\', \'types\': [\'administrative_area_level_2\', \'political\']}, {\'long_name\': "Nizhegorodskaya oblast\'", \'short_name\': "Nizhegorodskaya oblast\'", \'types\': [\'administrative_area_level_1\', \'political\']}, {\'long_name\': \'Russia\', \'short_name\': \'RU\', \'types\': [\'country\', \'political\']}, {\'long_name\': \'603162\', \'short_name\': \'603162\', \'types\': [\'postal_code\']}], \'formatted_address\': "Ulitsa Marshala Rokossovskogo, 4, Nizhnij Novgorod, Nizhegorodskaya oblast\', Russia, 603162", \'geometry\': {\'location\': {\'lat\': 56.28278, \'lng\': 44.0456111}, \'location_type\': \'ROOFTOP\', \'viewport\': {\'northeast\': {\'lat\': 56.2841289802915, \'lng\': 44.0469600802915}, \'southwest\': {\'lat\': 56.2814310197085, \'lng\': 44.0442621197085}}}, \'place_id\': \'ChIJMeknJyXVUUERI9A8HCXBznI\', \'plus_code\': {\'compound_code\': \'72MW+46 Nizhny Novgorod, Nizhny Novgorod Oblast, Russia\', \'global_code\': \'9H8672MW+46\'}, \'types\': [\'street_address\']}], \'status\': \'OK\'}'

Applying json.loads to it results in Expecting ',' delimiter: line 1 column 609 (char 608)

Got the following error tring the suggested solution

You need to show how you are producing that string in the first place and how you saved it to the CSV. You should fix this then, not when reading it back. — Daniel Roseman, Oct 02 '19 at 09:59
@DanielRoseman unfortunately, I can not query API one more time as I have many rows, so I need to work with what I have — kndl, Oct 02 '19 at 12:02

score 0 · Answer 1 · answered Oct 02 '19 at 10:03

You can use the literal_eval method of the ast module, e.g.:

import ast
mystr = '{\'results\': [{\'address_components\': [{\'long_name\': \'4\', \'short_name\': \'4\', \'types\': [\'street_number\']}, {\'long_name\': \'Ulitsa Marshala Rokossovskogo\', \'short_name\': \'Ulitsa Marshala Rokossovskogo\', \'types\': [\'route\']}, {\'long_name\': \'Nizhnij Novgorod\', \'short_name\': \'Nizhnij Novgorod\', \'types\': [\'locality\', \'political\']}, {\'long_name\': \'Sovetskiy\', \'short_name\': \'Sovetskiy\', \'types\': [\'administrative_area_level_3\', \'political\']}, {\'long_name\': \'Gorod Nizhniy Novgorod\', \'short_name\': \'Gorod Nizhniy Novgorod\', \'types\': [\'administrative_area_level_2\', \'political\']}, {\'long_name\': "Nizhegorodskaya oblast\'", \'short_name\': "Nizhegorodskaya oblast\'", \'types\': [\'administrative_area_level_1\', \'political\']}, {\'long_name\': \'Russia\', \'short_name\': \'RU\', \'types\': [\'country\', \'political\']}, {\'long_name\': \'603162\', \'short_name\': \'603162\', \'types\': [\'postal_code\']}], \'formatted_address\': "Ulitsa Marshala Rokossovskogo, 4, Nizhnij Novgorod, Nizhegorodskaya oblast\', Russia, 603162", \'geometry\': {\'location\': {\'lat\': 56.28278, \'lng\': 44.0456111}, \'location_type\': \'ROOFTOP\', \'viewport\': {\'northeast\': {\'lat\': 56.2841289802915, \'lng\': 44.0469600802915}, \'southwest\': {\'lat\': 56.2814310197085, \'lng\': 44.0442621197085}}}, \'place_id\': \'ChIJMeknJyXVUUERI9A8HCXBznI\', \'plus_code\': {\'compound_code\': \'72MW+46 Nizhny Novgorod, Nizhny Novgorod Oblast, Russia\', \'global_code\': \'9H8672MW+46\'}, \'types\': [\'street_address\']}], \'status\': \'OK\'}'
mydic = ast.literal_eval(mystr)

I have tried ```df['json'] = df['json_address'].apply(lambda x: ast.literal_eval(x))``` and got ```invalid syntax``` error, screenshot in the question — kndl, Oct 02 '19 at 10:14
I don't think you want to use the `apply` method. Just do, e.g., `adv3['json'] = ast.literval_eval(adv3['json_address'])`. — Matt Pitkin, Oct 02 '19 at 10:26
Actually, I just noticed that your data is stored in a pandas DataFrame, so maybe you do want to use apply. Maybe it's having problems with the cyrillic letters? — Matt Pitkin, Oct 02 '19 at 10:30
What happens if you just parse the string contained in a single entry of the DataFrame directly with `ast.literal_eval`, e.g., `ast.literval_eval(adv3['json_address'][0])`. Also, does the example in my answer actually work for you? — Matt Pitkin, Oct 02 '19 at 10:34
I found out that your solution works for the majority of rows and the problem arises only in the certain ones. For example, it does not work for a string ```'{\'formatted_address\': \'ТЦ "Солнечный", Kostromskaya oblast\', Russia, 156016\'}'``` To make it work I need to delete ```\'``` in ```Kostromskaya oblast\'``` however I do not understand why it happens — kndl, Oct 02 '19 at 11:03

How to properly read JSON from CSV

1 Answers1