I am using an API for a literal dictionary in order to go through a list of words and create a dataframe of their definitions and word origin. My ultimate goal is to see how many words in the list are derived from Latin, Greek, Old English, etc.
The problem is, some words have multiple definitions, and those definitions are nested within the data. For example, 'tart' has three definitions, meaning if you query the API for 'tart' you will get a list of three dictionaries, and the definition is nested within those dictionaries. The way my code is written right now, it's only including the last definition in the list.
I want the dataframe to have either: a) one row for each word, then columns for "definition 1", "definition 2", "word origin 1", "word origin 2", etc., or b) one row for each definition, so "tart" would be three rows
But I don't see how to do that without creating a very confusing and complicated code that involves for loops within for loops within for loops, etc.
Here is my code:
wordlist = ['illicitly', 'tray', 'tali', 'tart', 'itty']
rows = []
for word in wordlist:
row = {}
row['word'] = word
print(word)
url = "https://api.dictionaryapi.dev/api/v2/entries/en/"+word
# time.sleep(5)
response = requests.get(url)
if response.status_code != 200:
row['response'] = "Not found"
rows.append(row)
continue
else:
row['response'] = "Found"
data = response.json()
row['number_of_definitions'] = len(data)
for d in data:
if 'origin' in d.keys():
row['origin'] = d['origin']
else:
row['origin'] = "No origin found"
rows.append(row)
df = pd.DataFrame(rows)
UPDATED TO ADD: Although there are other Stackoverflow questions about nested JSON data, this situation is different because not every dictionary has the same key. For example, some entries include a word origin, and others do not.