3

I am a bit confused how to extract information from a nested list of ordered lists in Python. For example:

list_of_interest = [OrderedDict([('name', 'Viscozyme'), ('company', 'Roche (Chile)')]),
 [OrderedDict([('name', 'Davictrel'), ('company', None)]),
  OrderedDict([('name', 'Enbrel Sureclick'), ('company', None)]),
  OrderedDict([('name', 'Tunex'), ('company', None)])],
 OrderedDict([('name', 'Angiox'), ('company', None)]),
 [OrderedDict([('name', 'Enantone'), ('company', None)]),
  OrderedDict([('name', 'Leuplin'), ('company', 'Takeda')]),
  OrderedDict([('name', 'LeuProMaxx'), ('company', 'Baxter/Teva')]),
  OrderedDict([('name', 'Leupromer'), ('company', None)]),
  OrderedDict([('name', 'Lutrate'), ('company', None)]),
  OrderedDict([('name', 'Memryte'), ('company', 'Curaxis')]),
  OrderedDict([('name', 'Prostap 3'), ('company', 'Takeda UK')]),
  OrderedDict([('name', 'Prostap SR'), ('company', 'Takeda UK')]),
  OrderedDict([('name', 'Viadur'), ('company', 'Bayer AG')])],
 OrderedDict([('name', 'Geref'), ('company', 'Serono Pharma')])]

I need to extract all items under 'name'.

So I need a function:

get_names(list_of_interest) --> ['Viscozyme', 'Davictrel', 'Enbrel Sureclick', 'Tunex', 'Angiox', 'Enantone', ..., 'Geref']

I honestly tried nested list comprehensions, generator expressions and even pandas data frame, but it fails, as some sublists are single values.

Arnold Klein
  • 2,956
  • 10
  • 31
  • 60

6 Answers6

7
from collections import OrderedDict

list_of_interest =\
    [OrderedDict([('name', 'Viscozyme'), ('company', 'Roche (Chile)')]),
    [OrderedDict([('name', 'Davictrel'), ('company', None)]),
     OrderedDict([('name', 'Enbrel Sureclick'), ('company', None)]),
     OrderedDict([('name', 'Tunex'), ('company', None)])],
     OrderedDict([('name', 'Angiox'), ('company', None)]),
    [OrderedDict([('name', 'Enantone'), ('company', None)]),
     OrderedDict([('name', 'Leuplin'), ('company', 'Takeda')]),
     OrderedDict([('name', 'LeuProMaxx'), ('company', 'Baxter/Teva')]),
     OrderedDict([('name', 'Leupromer'), ('company', None)]),
     OrderedDict([('name', 'Lutrate'), ('company', None)]),
     OrderedDict([('name', 'Memryte'), ('company', 'Curaxis')]),
     OrderedDict([('name', 'Prostap 3'), ('company', 'Takeda UK')]),
     OrderedDict([('name', 'Prostap SR'), ('company', 'Takeda UK')]),
     OrderedDict([('name', 'Viadur'), ('company', 'Bayer AG')])],
     OrderedDict([('name', 'Geref'), ('company', 'Serono Pharma')])]

names = []
for item in list_of_interest:
    if isinstance(item, OrderedDict):
        names.append(item['name'])
    else:
        for list_ord_dict in item:
            names.append(list_ord_dict['name'])

print(names)
#['Viscozyme', 'Davictrel', 'Enbrel Sureclick', 'Tunex', 'Angiox', 'Enantone', 'Leuplin', 'LeuProMaxx', 'Leupromer', 'Lutrate', 'Memryte', 'Prostap 3', 'Prostap SR', 'Viadur', 'Geref']

You have two types of item, you can know that iterating and printing the type through your main list. If you have more depth, you can use a recursive function that would call itself when encountering a list. For the Dataset you provided, the code above works just fine.

IMCoins
  • 3,149
  • 1
  • 10
  • 25
3

Try this one:

def flat(l):
    ret = list()
    for ll in l:
        if isinstance(ll, (OrderedDict, list)):
            ret.extend(flat(ll))
        else:
            ret.append(ll)
    return ret

It should work with lists of any depth

MoaMoaK
  • 172
  • 1
  • 8
3

You can flatten your nested lists with a custom recursive function:

def flatten(l):
    for el in l:
        if isinstance(el, list):
            yield from flatten(el)
        else:
            yield el

Then simple create a new list comprehension collecting all names from each OrderedDict:

print([d["name"] for d in flatten(list_of_interest)])
# ['Viscozyme', 'Davictrel', 'Enbrel Sureclick', 'Tunex', 'Angiox', 'Enantone', 'Leuplin', 'LeuProMaxx', 'Leupromer', 'Lutrate', 'Memryte', 'Prostap 3', 'Prostap SR', 'Viadur', 'Geref']

Note: The yield from flatten(el) syntax is equivalent to for x flatten(el): yield x. This is just terse sytnax available in python 3.

RoadRunner
  • 25,803
  • 6
  • 42
  • 75
2

You'll have to loop over the list, then recurse into each nested list

def get_names(list_of_interest):
    names = []
    for d in list_of_interest:
        if ininstance(d, list):
            names.extend(get_names(d))
        else:
            names.append(d['name'])
    return names 
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
2

Adopting my answer from stackoverflow.com/a/9808122/1281485 and adjusting it to the slightly different task here:

def find(key, value):
  if isinstance(value, dict):
    for k, v in value.iteritems():
      if k == key:
        yield v
      else:
        for result in find(key, v):
          yield result
  elif isinstance(value, list):
    for element in value:
      for result in find(key, element):
        yield result

And then:

print(list(find('name', list_of_interest)))
Alfe
  • 56,346
  • 20
  • 107
  • 159
2

Another recursive option:

def flat(lst, res = None):
  if res == None: res = []
  for item in lst:
    if not type(item) == list: res.append(item['name'])
    else: flat(item, res)
  return res

print(flat(list_of_interest))
#=> ['Viscozyme', 'Davictrel', 'Enbrel Sureclick', 'Tunex', 'Angiox', 'Enantone', 'Leuplin', 'LeuProMaxx', 'Leupromer', 'Lutrate', 'Memryte', 'Prostap 3', 'Prostap SR', 'Viadur', 'Geref']
iGian
  • 11,023
  • 3
  • 21
  • 36