I'm using a Foursquare API call to find venues associated with particular ZIP codes in the US.
I am able to generate the JSON with information, but am having trouble looping and parsing to construct a pandas dataframe.
So far:
# scraping the foursquare website for the information we want and obtaining the json file as results
for i, series in df_income_zip_good.iterrows():
lat = series ['lat']
lng = series ['lng']
town = series ['place']
LIMIT = 100
radius = 1000
url4Sqr = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT)
venues = requests.get(url4Sqr).json()
#print results from call
print (venues)
#https://stackoverflow.com/questions/6386308/http-requests-and-json-parsing-in-python
This works fine and produces the JSON. I've linked the output to a JSON file on GitHub: (https://github.com/adhorvitz/coursera_ibm_capstone/blob/524c6609ea8872e0c188cd373a4778caaadb1cf6/venuedatasample.json)
I am not sure how to best flatten the JSON, then loop to extract the pieces of information I want to load into a dataframe. I've tried to mess around with the following with no success.
def flatten_json(nested_json, exclude=['']):
"""Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
exclude: Keys to exclude from output.
Returns:
The flattened json object if successful, None otherwise.
The code recursively extracts values out of the object into a flattened dictionary. json_normalize can be applied to the output of flatten_object to produce a python dataframe:
"""
out = {}
def flatten(x, name='venues', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude: flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
#https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10
I then run:
for i in venues():
json_flat_venues = flatten_json(venues)
json_flat_venues
An error is produced stating that the 'dict' object is not callable.
I've also tried:
for i in venues():
df_venues_good = pd.json_normalize(venues)
df_venues_good
The same error is produced.
I'm a bit lost on where to go, and how to best convert the JSON into a workable DF.
Thanks in advance.
-------update-----------
So I’ve tried a few things.
After I referenced the page left in the comments: https://www.geeksforgeeks.org/flattening-json-objects-in-python/, I installed json_flatten (using pop), but had issues importing flatten.
As an attempt at a work around I tried to re-create the code from the website, adapted to my project. I think I made more of a mess than I cleared up.
I re-ran the original "flatten_json" def (see above). I then assigned df_venues_good without the for loop statement (also above).
With the for loop removed it look like it starts to pulls the first record from the json. However, it looks like metadata (or at least data that I'm not trying to extract).
I also noticed an issue when reviewing the json. In my output (I'm using a Jupyter notebook) cell it looks like all of the records are retrieved (there are about 95 in all).
I then ran this to just dump the file to inspect:
JsonString = json.dumps(venues)
JsonFile = open("venuedata.json", "w")
JsonFile.write(JsonString)
JsonFile.close()
When I open the dump file (which I put linked above) it doesn't look complete.
Any direction would be appreciated.