0

I'm scraping a website, which returns a dictionary:

person = {'name0':{'first0': 'John', 'last0':'Smith'},
          'age0':'10',
          'location0':{'city0':'Dublin'}
         }

I'm trying to write a function that will return a dictionary {'name':'John', 'age':'10'} when passed the above dictionary.

I want to ideally put a try:... except KeyError around each item since sometimes keys will be missing.

def func(person):
    filters = [('age', 'age0'), ('name', ['name0', 'first0'])]
    result = {'name': None, 'age': None}
    for i in filters:
        try:
            result[i[0]] = person[i[1]]
        except KeyError:
            pass
    return result

The problem is result[i[0]] = person[i[1]] doesn't work for 'name' since there's two keys that need to be followed sequentially and I don't know how to do that.

I want some way of telling it (in the loop) to go to person['name0']['first0'] (and so on to whatever depth the thing I want is).

I have lots of things to extract, so I'd rather do it in a loop instead of a try..except statement for each variable individually.

  • Does [Flatten an irregular (arbitrarily nested) list of lists](https://stackoverflow.com/questions/2158395/flatten-an-irregular-arbitrarily-nested-list-of-lists) answer your question? – wwii Jan 22 '23 at 04:14
  • You can use [dict.get()](https://docs.python.org/3/library/stdtypes.html#dict.get) to handle missing keys instead of using try/except. – wwii Jan 22 '23 at 04:15
  • Some other ideas: [Nested dictionary value from key path](https://stackoverflow.com/questions/31033549/nested-dictionary-value-from-key-path). [Python: Easily access deeply nested dict (get and set)](https://stackoverflow.com/questions/3797957/python-easily-access-deeply-nested-dict-get-and-set). – wwii Jan 22 '23 at 04:28
  • Can you supply the website you are trying to scrape? – Lidor Eliyahu Shelef Jan 22 '23 at 07:57

3 Answers3

0

In order to follow several key sequentially, you can use get and set the default value to {} (empty dictionary) for the upper levels. Set the default value to None (or whatever suits you) for the last level:

def func(person):
    return {'name': person.get('name0', {}).get('first0', None),
            'age': person.get('age0', None)}
Tranbi
  • 11,407
  • 6
  • 16
  • 33
0

Best I could manage was using a for loop to iterate through the keys:

person = {'name0':{'first0': 'John', 'last0':'Smith'},
          'age0':'10',
          'location0':{'city0':'Dublin'}
         }

Additionally I used .get(key) rather than try..except as suggested by @wiwi


def func(person):
    filters = [('age', ['age0']), ('name', ['name0', 'first0'])]
    result = {'name': None, 'age': None}
    for filter in filters:
        temp = person.copy()
        for key in filter[1]:
            temp = temp.get(key)
            if not temp: # NoneType doesn't have .get method
                break
        result[filter[0]] = temp
    return result

func(person) then returns {'name': 'John', 'age': '10'}.

It handles missing input too:

person2 = {'age0':'10',
          'location0':{'city0':'Dublin'}}

func(person2) returns {'name': None, 'age': '10'}

0

You can put the try...except in another loop, if there's a list of keys instead of a single key:

def getNestedVal(obj, kPath:list, defaultVal=None):
    if isinstance(kPath, str) or not hasattr(kPath, '__iter__'): 
        kPath = [kPath] ## if not iterable, wrap as list
    for k in kPath: 
        try: obj = obj[k]
        except: return defaultVal
    return obj 

def func(person):
    filters = [('age', 'age0'), ('name', ['name0', 'first0']),#] 
               ('gender', ['gender0'], 'N/A')] # includes default value
    return {k[0]: getNestedVal(person, *k[1:3]) for k in filters}

[I added gender just to demonstrate how defaults can also be specified for missing values.]

With this, func(person) should return

{'age': '10', 'name': 'John', 'gender': 'N/A'}

I also have a flattenObj function, a version of which is defined below:

def flattenDict(orig:dict, kList=[], kSep='_', stripNum=True):
    if not isinstance(orig, dict): return [(kList, orig)]

    tList = []
    for k, v in orig.items():
        if isinstance(k, str) and stripNum: k = k.strip('0123456789')
        tList += flattenDict(v, kList+[str(k)], None)
        
    if not isinstance(kSep, str): return tList
    return {kSep.join(kl): v for kl,v in tList}

[I added stripNum just to get rid of the 0s in your keys...]

flattenDict(person) should return

{'name_first': 'John', 'name_last': 'Smith', 'age': '10', 'location_city': 'Dublin'}
Driftr95
  • 4,572
  • 2
  • 9
  • 21