2

I am trying to parse JSON data from an URL. I have fetched the data and parsed it into a dataframe. From the looks of it, I am missing a step.

Data Returns in JSON format in excel but my data frame returns two columns: entry number and JSON Text

import urllib.request
import json
import pandas
with urllib.request.urlopen("https://raw.githubusercontent.com/gavinr/usa-
mcdonalds-locations/master/mcdonalds.geojson") as url:
data = json.loads(url.read().decode())
print(data)
json_parsed = json.dumps(data)
print(json_parsed)

df=pandas.read_json(json_parsed)
writer = pandas.ExcelWriter('Mcdonaldsstorelist.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Anthony Richard
  • 127
  • 2
  • 12
  • My data frame returns two columns: entry number and Json Text -> See below {'geometry': {'type': 'Point', 'coordinates': [-80.140924, 25.789141]}, 'properties': {'storeNumber': '14372', 'playplace': 'N', 'storeUrl': 'http://www.mcflorida.com/14372', 'address': '1601 ALTON RD', 'driveThru': 'Y', 'phone': '(305)672-7055', 'storeType': 'FREESTANDING', 'freeWifi': 'Y', 'city': 'MIAMI BEACH', 'archCard': 'Y', 'state': 'FL', 'zip': '33139-2420'}, 'type': 'Feature'} I am looking to parse out each field into separate columns – Anthony Richard Sep 13 '17 at 01:03
  • Then parsing from the URL is not the problem. Pandas is – OneCricketeer Sep 13 '17 at 01:04
  • Have you tried printing the Dataframe itself? You can debug your code without Excel – OneCricketeer Sep 13 '17 at 01:06
  • yeah, printing the DF has the same results. Gives me the JSON string in the second column with no separation – Anthony Richard Sep 13 '17 at 01:07
  • Also, `json_parsed` is really not necessary. `json.loads` is what parses the JSON string. Putting it back into a string using `dumps` shouldn't be required – OneCricketeer Sep 13 '17 at 01:09
  • See if you missed something from here https://stackoverflow.com/questions/21104592/json-to-pandas-dataframe – OneCricketeer Sep 13 '17 at 01:10
  • 1
    Looks like I have to flatten arrays inside the string, I have been googling some techniques. thanks for the help – Anthony Richard Sep 13 '17 at 01:41

1 Answers1

2

I believe you can use json_normalize:

df = pd.io.json.json_normalize(data['features'])

df.head()

      geometry.coordinates geometry.type    properties.address  \
0  [-80.140924, 25.789141]         Point         1601 ALTON RD   
1  [-80.218683, 25.765501]         Point        1400 SW 8TH ST   
2  [-80.185108, 25.849872]         Point    8116 BISCAYNE BLVD   
3   [-80.37197, 25.550894]         Point    23351 SW 112TH AVE   
4   [-80.36734, 25.579132]         Point  10855 CARIBBEAN BLVD   

  properties.archCard properties.city properties.driveThru  \
0                   Y     MIAMI BEACH                    Y   
1                   Y           MIAMI                    Y   
2                   Y           MIAMI                    Y   
3                   N       HOMESTEAD                    Y   
4                   Y           MIAMI                    Y   

  properties.freeWifi properties.phone properties.playplace properties.state  \
0                   Y    (305)672-7055                    N               FL   
1                   Y    (305)285-0974                    Y               FL   
2                   Y    (305)756-0400                    N               FL   
3                   Y    (305)258-7837                    N               FL   
4                   Y    (305)254-3487                    Y               FL   

  properties.storeNumber properties.storeType             properties.storeUrl  \
0                  14372         FREESTANDING  http://www.mcflorida.com/14372   
1                   7408         FREESTANDING   http://www.mcflorida.com/7408   
2                  11511         FREESTANDING  http://www.mcflorida.com/11511   
3                  34014         FREESTANDING                             NaN   
4                  12215         FREESTANDING  http://www.mcflorida.com/12215   

  properties.zip     type  
0     33139-2420  Feature  
1          33135  Feature  
2          33138  Feature  
3          33032  Feature  
4          33157  Feature  

df.columns

Index(['geometry.coordinates', 'geometry.type', 'properties.address',
       'properties.archCard', 'properties.city', 'properties.driveThru',
       'properties.freeWifi', 'properties.phone', 'properties.playplace',
       'properties.state', 'properties.storeNumber', 'properties.storeType',
       'properties.storeUrl', 'properties.zip', 'type'],
      dtype='object')
cs95
  • 379,657
  • 97
  • 704
  • 746