0

I have nested lists and dicts returned from an API. The last three keys in each ["lexicalEntries"][0] dict is a legend for all the definitions in the associated "entries".

dd = {
    "metadata": {...},
    "results": [
        {
            "lexicalEntries": [
                {
                    "entries": [
                        ...definitions
                    ],
                    "language": "en-us",
                    "lexicalCategory": {"id": "noun", "text": "Noun"},
                    "text": "school",
                },
                {
                    "entries": [
                        ...more definitions
                    ],
                    "language": "en-us",
                    "lexicalCategory": {"id": "verb", "text": "Verb"},
                    "text": "school",
                },
            ],
        },
        {
            "lexicalEntries": [
                {
                    "entries": [
                        ...more definitions
                    ],
                    "language": "en-us",
                    "lexicalCategory": {"id": "noun", "text": "Noun"},
                    "text": "school",
                },
                {
                    "entries": [
                        ...more definitions
                    ],
                    "language": "en-us",
                    "lexicalCategory": {"id": "verb", "text": "Verb"},
                    "text": "school",
                },
            ],
        },
    ],
}

The code that extracts the definitions:

def gen_dict_extract(key, var):
    if hasattr(var, "items"):
        for k, v in var.items():
            if k == key:
                yield v
            if isinstance(v, dict):
                for result in gen_dict_extract(key, v):
                    yield result
            elif isinstance(v, list):
                for d in v:
                    for result in gen_dict_extract(key, d):
                        yield result


count = len(list(gen_dict_extract("definitions", dd)))
gendList = list(gen_dict_extract("definitions", dd))

print(f"\nResults: {count}\n")

x = 1
for i in gendList:
    print(f"{x}. {i[0].capitalize()}.\n")
    x += 1

It prints this:

Results: 13

1. An institution for educating children.

2. The buildings used by an institution for educating children.

3 - 11

12. A large group of fish or sea mammals.

13. (of fish or sea mammals) form a large group.

I want it to print this example output:

Results: 13

Noun

1. An institution for educating children.

2 - 9

Verb

10. Send to school; educate.

11 - 13

Here's the full API result:

{'id': 'school', 'metadata': {'operation': 'retrieve', 'provider': 'Oxford University Press', 'schema': 'RetrieveEntry'}, 'results': [{'id': 'school', 'language': 'en-us', 'lexicalEntries': [{'entries': [{'homographNumber': '100', 'senses': [{'definitions': ['an institution for educating children'], 'id': 'm_en_gbus0907270.006', 'subsenses': [{'definitions': ['the buildings used by an institution 
for educating children'], 'id': 'm_en_gbus0907270.009'}, {'definitions': ['the students and staff of a school'], 'id': 'm_en_gbus0907270.010'}, {'definitions': 
["a day's work at school"], 'id': 'm_en_gbus0907270.012'}]}, {'definitions': ['any institution at which instruction is given in a particular discipline'], 'id': 'm_en_gbus0907270.016', 'subsenses': [{'definitions': ['a university'], 'id': 'm_en_gbus0907270.017'}, {'definitions': ['a department or faculty of a college concerned with a particular subject of study'], 'id': 'm_en_gbus0907270.018'}]}, 
{'definitions': ['a group of people, particularly writers, artists, or philosophers, sharing the same or similar ideas, methods, or style'], 'id': 'm_en_gbus0907270.020', 'subsenses': [{'definitions': ['a style, approach, or method of a specified character'], 'id': 'm_en_gbus0907270.021'}]}]}], 'language': 'en-us', 'lexicalCategory': {'id': 'noun', 'text': 'Noun'}, 'text': 'school'}, {'entries': [{'homographNumber': '101', 'senses': [{'definitions': ['send to school; educate'], 'id': 'm_en_gbus0907270.041', 'subsenses': [{'definitions': ['train or discipline (someone) in a particular skill or activity'], 'id': 'm_en_gbus0907270.047'}]}]}], 'language': 'en-us', 'lexicalCategory': {'id': 'verb', 'text': 'Verb'}, 
'text': 'school'}], 'type': 'headword', 'word': 'school'}, {'id': 'school', 'language': 'en-us', 'lexicalEntries': [{'entries': [{'homographNumber': '200', 'senses': [{'definitions': ['a large group of fish or sea mammals'], 'id': 'm_en_gbus0907280.005'}]}], 'language': 'en-us', 'lexicalCategory': {'id': 'noun', 'text': 'Noun'}, 'text': 'school'}, {'entries': [{'homographNumber': '201', 'senses': 
[{'definitions': ['(of fish or sea mammals) form a large group'], 'id': 'm_en_gbus0907280.009'}]}], 'language': 'en-us', 'lexicalCategory': {'id': 'verb', 'text': 'Verb'}, 'text': 'school'}], 'type': 'headword', 'word': 'school'}], 'word': 
'school'}

The association that I want to maintain is the part of speech (e.g. noun or verb). And I want to print it as shown above in the example output.

I don't know the best approach to this problem. Please help a newbie. Even if you can just direct me on how to go about solving this problem.

Ref: I used the generator from this post: Find all occurrences of a key in nested dictionaries and lists

  • 1
    you have given the complete result from the API which is great but its super hard to identify what you are trying to do with just verbal description.(can you give examples in each section?) (where is the list of dicts that you refer to in the API output?) (how does the API output, which is a lot of nested dicts and lists, structured?) (what is the output data structure that you are planning on that keeps the 'associations'? maybe give a expected output example?) check how to [mre] so that others on SO can help you. – Akshay Sehgal Aug 15 '20 at 17:21
  • Thank you for taking a look. I can try and be more clear. Give me a moment, please. – formerlyanakin Aug 15 '20 at 20:47
  • I updated, hope it's more clear. – formerlyanakin Aug 15 '20 at 22:55

1 Answers1

0

This code should get the main definition grouped by category (noun\verb). I skipped the subsense definitions.

dd = {'id': 'school', 'metadata': {'operation': 'retrieve', 'provider': 'Oxford University Press', 'schema': 'RetrieveEntry'
    }, 'results': [
        {'id': 'school', 'language': 'en-us', 'lexicalEntries': [

      ...........
      
    ], 
    'word': 'school'
}


dd2 = {}

dd2[dd['word']] = []
for d in dd['results']: # assume all ids are same word 
   s = {}
   for le in d['lexicalEntries']:  # noun\verb
       key = le['lexicalCategory']['id']  # noun\verb
       df = []
       for ss in le['entries'][0]['senses']:  # each definition
          df.append(ss['definitions'][0])
       s[key] = df
   dd2[dd['word']].append(s)
   
print(dd2)
   

Output:

{'school': [
        {
         'noun': ['an institution for educating children', 
                  'any institution at which instruction is given in a particular discipline', 
                  'a group of people, particularly writers, artists, or philosophers, sharing the same or similar ideas, methods, or style'], 
         'verb': ['send to school; educate']
        },
        {
         'noun': ['a large group of fish or sea mammals'], 
         'verb': ['(of fish or sea mammals) form a large group']
        }
    ]
}
Mike67
  • 11,175
  • 2
  • 7
  • 15
  • Thank you so much for your time. I am working with your code right now to get to the subsenses. I will let you know what I changed to also get the subsenses. – formerlyanakin Aug 15 '20 at 20:46