I'm banging my head against the wall. I'm working with the GeoDeepDive API, and trying to "tidy" the data. My code:
import requests
import pandas as pd
response = requests.get("https://geodeepdive.org/api/articles?pubname_like=Geochronology")
data = response.json()
df = pd.json_normalize(data)
df = df['success.data']
dic = df[0]
df1 = pd.DataFrame.from_dict(dic)
df1.to_csv("output_file.csv")
The result is almost perfect, but nested json's are causing an issue where I get data like this:
CSV output with top two rows edited manually
CSV top 2 rows are the output I want to have, and the bottom two are what I get. I need to somehow "dig" into the dictionaries.
I'm trying to iterate through those specific dataframe columns and either pick just the url, or turn the names, into a list of names. I keep getting errors with whatever I try though, and I'm thinking there may even be a better way to do this. Any thoughts?