0

I am using an API for a literal dictionary in order to go through a list of words and create a dataframe of their definitions and word origin. My ultimate goal is to see how many words in the list are derived from Latin, Greek, Old English, etc.

The problem is, some words have multiple definitions, and those definitions are nested within the data. For example, 'tart' has three definitions, meaning if you query the API for 'tart' you will get a list of three dictionaries, and the definition is nested within those dictionaries. The way my code is written right now, it's only including the last definition in the list.

I want the dataframe to have either: a) one row for each word, then columns for "definition 1", "definition 2", "word origin 1", "word origin 2", etc., or b) one row for each definition, so "tart" would be three rows

But I don't see how to do that without creating a very confusing and complicated code that involves for loops within for loops within for loops, etc.

Here is my code:

wordlist = ['illicitly', 'tray', 'tali', 'tart', 'itty']

rows = []

for word in wordlist:
    row = {}
    row['word'] = word
    print(word)
    url = "https://api.dictionaryapi.dev/api/v2/entries/en/"+word
#    time.sleep(5)
    response = requests.get(url)
    if response.status_code != 200: 
        row['response'] = "Not found"
        rows.append(row)
        continue
    else:
        row['response'] = "Found"
    data = response.json()
    row['number_of_definitions'] = len(data)
    for d in data:
        if 'origin' in d.keys():
            row['origin'] = d['origin']
        else:
            row['origin'] = "No origin found"
    rows.append(row)
    
df = pd.DataFrame(rows)

UPDATED TO ADD: Although there are other Stackoverflow questions about nested JSON data, this situation is different because not every dictionary has the same key. For example, some entries include a word origin, and others do not.

user3710004
  • 511
  • 1
  • 6
  • 15
  • Does this answer your question? [Construct pandas DataFrame from items in nested dictionary](https://stackoverflow.com/questions/13575090/construct-pandas-dataframe-from-items-in-nested-dictionary) – Karl Oct 15 '21 at 17:03

1 Answers1

0

Try using json_normalize from pandas:

import requests
import pandas as pd

wordlist = ['illicitly', 'tray', 'tali', 'tart', 'itty']

rows = []
list_data = []
for word in wordlist:
    row = {}
    row['word'] = word
    print(word)
    url = "https://api.dictionaryapi.dev/api/v2/entries/en/"+word
#    time.sleep(5)
    response = requests.get(url)
    if response.status_code != 200: 
        row['response'] = "Not found"
        rows.append(row)
        continue
    else:
        row['response'] = "Found"
    data = response.json()
    list_data.extend(data)
    
for d in list_data:
    if 'origin' not in d:
        d['origin'] = 'NA'

df = pd.json_normalize(list_data, record_path = ['meanings', 'definitions'],
                         meta = ['word', 'phonetic', 'origin',
                                 ['meanings', 'partOfSpeech']],
                         errors = "ignore")
Python on Toast
  • 254
  • 1
  • 5
  • This is very helpful. The only issue is, it still does not get me the word origins. Not every entry has a word origin, and if I add 'origin' to the meta, it gives me an error. I'm fine with it just saying "NA" if there is no origin. – user3710004 Oct 24 '21 at 18:26
  • "KeyError" is caused when a key does not exist in the target structure. To fix this, make sure the key word exists for every dict in the object. I have updated the code above accordingly. – Python on Toast Oct 25 '21 at 19:46