0

I'm banging my head against the wall. I'm working with the GeoDeepDive API, and trying to "tidy" the data. My code:

import requests
import pandas as pd

response = requests.get("https://geodeepdive.org/api/articles?pubname_like=Geochronology")

data = response.json()
df = pd.json_normalize(data)
df = df['success.data']
dic = df[0]

df1 = pd.DataFrame.from_dict(dic)
df1.to_csv("output_file.csv")

The result is almost perfect, but nested json's are causing an issue where I get data like this:

CSV output with top two rows edited manually

CSV top 2 rows are the output I want to have, and the bottom two are what I get. I need to somehow "dig" into the dictionaries.

I'm trying to iterate through those specific dataframe columns and either pick just the url, or turn the names, into a list of names. I keep getting errors with whatever I try though, and I'm thinking there may even be a better way to do this. Any thoughts?

  • Could you please provide an MRE ([How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example), [minimal reproducible-(pandas) examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)) and be a bit more specific regarding your expected output. – Timus Dec 14 '20 at 09:37
  • 1
    Absolutely! I'll edit the question now. – Drew Heasman Dec 14 '20 at 19:06

1 Answers1

0

Could you try if this

...
df1 = pd.DataFrame.from_dict(dic)
df1.link = df1.link.apply(lambda l: l[0]['url'])
df1.author = df1.author.apply(lambda l: ';'.join(d['name'] for d in l))
df1.to_csv("output_file.csv")

fits your needs?

Timus
  • 10,974
  • 5
  • 14
  • 28