-1

I have a dataframe with free form addresses which I normalize with Google Maps Geocoder API. The result is returned in JSON, written in a column and saved to csv.

However, when I read this csv back slashes appear in a column with a JSON, because the column was read from csv as string, single quotes appeared at the beginning and the end, and single quotes inside the json were escaped with back slashes.

So I get

json_address
'{\'status\': \'OK\'}'

While I need

json_address
{'status': 'OK'}

The solution here does not work for me, as my csv is read with single quotes and I get the error Expecting property name enclosed in double quotes

In addition to this I do not want to normalize JSON and put it in separate columns, I want the output as a dictionary specified above

I tried to replace single quotes with double quotes

df['json_double'] = df['json_address'].apply(lambda x: x.replace("\'", "\""))

It worked so I got

''{\''status\'': \'OK\''}''

However, when I tryed df['json'] = df['json_double'].apply(json.loads) I got the error Expecting ',' delimiter: line 1 column 609 (char 608)

So my questions are

  1. How to handle JSON properly in future to avoid such problems?
  2. How to solve my problem? I am interested both with the solution on the reading stage and on the stage of transforming string column to json.

PS the example of the whole string I want to have as json is

'{\'results\': [{\'address_components\': [{\'long_name\': \'4\', \'short_name\': \'4\', \'types\': [\'street_number\']}, {\'long_name\': \'Ulitsa Marshala Rokossovskogo\', \'short_name\': \'Ulitsa Marshala Rokossovskogo\', \'types\': [\'route\']}, {\'long_name\': \'Nizhnij Novgorod\', \'short_name\': \'Nizhnij Novgorod\', \'types\': [\'locality\', \'political\']}, {\'long_name\': \'Sovetskiy\', \'short_name\': \'Sovetskiy\', \'types\': [\'administrative_area_level_3\', \'political\']}, {\'long_name\': \'Gorod Nizhniy Novgorod\', \'short_name\': \'Gorod Nizhniy Novgorod\', \'types\': [\'administrative_area_level_2\', \'political\']}, {\'long_name\': "Nizhegorodskaya oblast\'", \'short_name\': "Nizhegorodskaya oblast\'", \'types\': [\'administrative_area_level_1\', \'political\']}, {\'long_name\': \'Russia\', \'short_name\': \'RU\', \'types\': [\'country\', \'political\']}, {\'long_name\': \'603162\', \'short_name\': \'603162\', \'types\': [\'postal_code\']}], \'formatted_address\': "Ulitsa Marshala Rokossovskogo, 4, Nizhnij Novgorod, Nizhegorodskaya oblast\', Russia, 603162", \'geometry\': {\'location\': {\'lat\': 56.28278, \'lng\': 44.0456111}, \'location_type\': \'ROOFTOP\', \'viewport\': {\'northeast\': {\'lat\': 56.2841289802915, \'lng\': 44.0469600802915}, \'southwest\': {\'lat\': 56.2814310197085, \'lng\': 44.0442621197085}}}, \'place_id\': \'ChIJMeknJyXVUUERI9A8HCXBznI\', \'plus_code\': {\'compound_code\': \'72MW+46 Nizhny Novgorod, Nizhny Novgorod Oblast, Russia\', \'global_code\': \'9H8672MW+46\'}, \'types\': [\'street_address\']}], \'status\': \'OK\'}'

Applying json.loads to it results in Expecting ',' delimiter: line 1 column 609 (char 608)

Got the following error tring the suggested solutionenter image description here

kndl
  • 27
  • 5

1 Answers1

0

You can use the literal_eval method of the ast module, e.g.:

import ast
mystr = '{\'results\': [{\'address_components\': [{\'long_name\': \'4\', \'short_name\': \'4\', \'types\': [\'street_number\']}, {\'long_name\': \'Ulitsa Marshala Rokossovskogo\', \'short_name\': \'Ulitsa Marshala Rokossovskogo\', \'types\': [\'route\']}, {\'long_name\': \'Nizhnij Novgorod\', \'short_name\': \'Nizhnij Novgorod\', \'types\': [\'locality\', \'political\']}, {\'long_name\': \'Sovetskiy\', \'short_name\': \'Sovetskiy\', \'types\': [\'administrative_area_level_3\', \'political\']}, {\'long_name\': \'Gorod Nizhniy Novgorod\', \'short_name\': \'Gorod Nizhniy Novgorod\', \'types\': [\'administrative_area_level_2\', \'political\']}, {\'long_name\': "Nizhegorodskaya oblast\'", \'short_name\': "Nizhegorodskaya oblast\'", \'types\': [\'administrative_area_level_1\', \'political\']}, {\'long_name\': \'Russia\', \'short_name\': \'RU\', \'types\': [\'country\', \'political\']}, {\'long_name\': \'603162\', \'short_name\': \'603162\', \'types\': [\'postal_code\']}], \'formatted_address\': "Ulitsa Marshala Rokossovskogo, 4, Nizhnij Novgorod, Nizhegorodskaya oblast\', Russia, 603162", \'geometry\': {\'location\': {\'lat\': 56.28278, \'lng\': 44.0456111}, \'location_type\': \'ROOFTOP\', \'viewport\': {\'northeast\': {\'lat\': 56.2841289802915, \'lng\': 44.0469600802915}, \'southwest\': {\'lat\': 56.2814310197085, \'lng\': 44.0442621197085}}}, \'place_id\': \'ChIJMeknJyXVUUERI9A8HCXBznI\', \'plus_code\': {\'compound_code\': \'72MW+46 Nizhny Novgorod, Nizhny Novgorod Oblast, Russia\', \'global_code\': \'9H8672MW+46\'}, \'types\': [\'street_address\']}], \'status\': \'OK\'}'
mydic = ast.literal_eval(mystr)
Matt Pitkin
  • 3,989
  • 1
  • 18
  • 32
  • I have tried ```df['json'] = df['json_address'].apply(lambda x: ast.literal_eval(x))``` and got ```invalid syntax``` error, screenshot in the question – kndl Oct 02 '19 at 10:14
  • I don't think you want to use the `apply` method. Just do, e.g., `adv3['json'] = ast.literval_eval(adv3['json_address'])`. – Matt Pitkin Oct 02 '19 at 10:26
  • Actually, I just noticed that your data is stored in a pandas DataFrame, so maybe you do want to use apply. Maybe it's having problems with the cyrillic letters? – Matt Pitkin Oct 02 '19 at 10:30
  • What happens if you just parse the string contained in a single entry of the DataFrame directly with `ast.literal_eval`, e.g., `ast.literval_eval(adv3['json_address'][0])`. Also, does the example in my answer actually work for you? – Matt Pitkin Oct 02 '19 at 10:34
  • I found out that your solution works for the majority of rows and the problem arises only in the certain ones. For example, it does not work for a string ```'{\'formatted_address\': \'ТЦ "Солнечный", Kostromskaya oblast\', Russia, 156016\'}'``` To make it work I need to delete ```\'``` in ```Kostromskaya oblast\'``` however I do not understand why it happens – kndl Oct 02 '19 at 11:03