0

Please I need help with a problem that needs to be solved without pandas or numpy. I have two list of dictionaries i.e list1 and list2. I need to sort list2 by "post_code", and group th e sorted list2 by "code" before joining list1 and list2 by two different keys that have the same values. In list1 the key "practice" is equivalent to the key "code" in sorted list2. I need to join list1 and list2 by the equivalent keys of "practice" and "code".

list1=
[{'bnf_code': '0101010G0AAABAB',
  'items': 2,
  'practice': 'N81013',
  'bnf_name': 'Co-Magaldrox_Susp 195mg/220mg/5ml S/F',
  'nic': 5.98,
  'act_cost': 5.56,
  'quantity': 1000},
 {'bnf_code': '0101021B0AAAHAH',
  'items': 1,
  'practice': 'A81001',
  'bnf_name': 'Alginate_Raft-Forming Oral Susp S/F',
  'nic': 1.95,
  'act_cost': 1.82,
  'quantity': 500},
 {'bnf_code': '0101021B0AAALAL',
  'items': 12,
  'practice': 'A81002',
  'bnf_name': 'Sod Algin/Pot Bicarb_Susp S/F',
  'nic': 64.51,
  'act_cost': 59.95,
  'quantity': 6300},
 {'bnf_code': '0101021B0AAAPAP',
  'items': 3,
  'practice': 'A81004',
  'bnf_name': 'Sod Alginate/Pot Bicarb_Tab Chble 500mg',
  'nic': 9.21,
  'act_cost': 8.55,
  'quantity': 180},
 {'bnf_code': '0101021B0BEADAJ',
  'items': 6,
  'practice': 'A81003',
  'bnf_name': 'Gaviscon Infant_Sach 2g (Dual Pack) S/F',
  'nic': 28.92,
  'act_cost': 26.84,
  'quantity': 90}]

list2=
[{'code': 'A81001',
  'name': 'THE DENSHAM SURGERY',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'LAWSON STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 1HU'},
 {'code': 'A81002',
  'name': 'QUEENS PARK MEDICAL CENTRE',
  'addr_1': 'QUEENS PARK MEDICAL CTR',
  'addr_2': 'FARRER STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 2AW'},
 {'code': 'A81003',
  'name': 'VICTORIA MEDICAL PRACTICE',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'VICTORIA ROAD',
  'borough': 'HARTLEPOOL',
  'village': 'CLEVELAND',
  'post_code': 'TS26 8DB'},
 {'code': 'A81004',
  'name': 'WOODLANDS ROAD SURGERY',
  'addr_1': '6 WOODLANDS ROAD',
  'addr_2': None,
  'borough': 'MIDDLESBROUGH',
  'village': 'CLEVELAND',
  'post_code': 'TS1 3BE'},
 {'code': 'N81013',
  'name': 'SPRINGWOOD SURGERY',
  'addr_1': 'SPRINGWOOD SURGERY',
  'addr_2': 'RECTORY LANE',
  'borough': 'GUISBOROUGH',
  'village': None,
  'post_code': 'TS14 7DJ'}]

I have been able to sort list2 by post_code and group by code but I am lost with regards to how to join list1 and list2. Here is the code I used so far for the sorting and grouping.

import itertools
from operator import itemgetter
sorted_post_code = sorted(list2, key=itemgetter('post_code'))
for key, group in itertools.groupby(sorted_post_code, key=lambda x:x['code']):
    #print (key),
    print (list(group))

The expected out put is

joined_list=
list1=
[{'bnf_code': '0101010G0AAABAB',
  'items': 2,
  'practice': 'N81013',
  'bnf_name': 'Co-Magaldrox_Susp 195mg/220mg/5ml S/F',
  'nic': 5.98,
  'act_cost': 5.56,
  'quantity': 1000,
  'code': 'N81013',
  'name': 'SPRINGWOOD SURGERY',
  'addr_1': 'SPRINGWOOD SURGERY',
  'addr_2': 'RECTORY LANE',
  'borough': 'GUISBOROUGH',
  'village': None,
  'post_code': 'TS14 7DJ'},
 {'bnf_code': '0101021B0AAAHAH',
  'items': 1,
  'practice': 'A81001',
  'bnf_name': 'Alginate_Raft-Forming Oral Susp S/F',
  'nic': 1.95,
  'act_cost': 1.82,
  'quantity': 500,
  'code': 'A81001',
  'name': 'THE DENSHAM SURGERY',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'LAWSON STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 1HU'},
 {'bnf_code': '0101021B0AAALAL',
  'items': 12,
  'practice': 'A81002',
  'bnf_name': 'Sod Algin/Pot Bicarb_Susp S/F',
  'nic': 64.51,
  'act_cost': 59.95,
  'quantity': 6300,
  'code': 'A81002',
  'name': 'QUEENS PARK MEDICAL CENTRE',
  'addr_1': 'QUEENS PARK MEDICAL CTR',
  'addr_2': 'FARRER STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 2AW'},
 {'bnf_code': '0101021B0AAAPAP',
  'items': 3,
  'practice': 'A81004',
  'bnf_name': 'Sod Alginate/Pot Bicarb_Tab Chble 500mg',
  'nic': 9.21,
  'act_cost': 8.55,
  'quantity': 180,
  'code': 'A81004',
  'name': 'WOODLANDS ROAD SURGERY',
  'addr_1': '6 WOODLANDS ROAD',
  'addr_2': None,
  'borough': 'MIDDLESBROUGH',
  'village': 'CLEVELAND',
  'post_code': 'TS1 3BE'},
 {'bnf_code': '0101021B0BEADAJ',
  'items': 6,
  'practice': 'A81003',
  'bnf_name': 'Gaviscon Infant_Sach 2g (Dual Pack) S/F',
  'nic': 28.92,
  'act_cost': 26.84,
  'quantity': 90,
  'code': 'A81003',
  'name': 'VICTORIA MEDICAL PRACTICE',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'VICTORIA ROAD',
  'borough': 'HARTLEPOOL',
  'village': 'CLEVELAND',
  'post_code': 'TS26 8DB'}]
IOI
  • 59
  • 5

2 Answers2

1

I understood that you wanted each dictionary in list1 to contain all entries of a dictionary in list2, if the values for keys 'code' and 'practice' of the dictionaries match.

If so, you can easily update all entries of a dictionary with entries from another dictionary. Missing key:value pairs will be added and existing keys will get their value updated.

So i ended up with a double for loop, which i did before any sorting. You may want to adjust that to your needs.

for entry2 in list2:
    for entry1 in list1:
        if entry2['code'] == entry1['practice']:
            entry1.update(entry2)

A very long explanation on different ways to join dictionaries can be found here: https://stackoverflow.com/a/26853961/6218902

rhall
  • 75
  • 6
1

A defaultdict might do reasonably well for the grouping operation. You can use a dict to update your grouped elements:

from collections import defaultdict

groups = defaultdict(dict)

# to show this explicitly you can start with two loops
# not the most efficient, but it shows the process
for item in list1:
    k = item['practice']
    groups[k].update(item)

for item in list2:
    k = item['code']
    groups[k].update(item)

# where groups.values() will have your "joined" 
# dictionaries
groups
{
  "N81013": {
    "bnf_code": "0101010G0AAABAB",
    "items": 2,
    "practice": "N81013",
    "bnf_name": "Co-Magaldrox_Susp 195mg/220mg/5ml S/F",
    "nic": 5.98,
    "act_cost": 5.56,
    "quantity": 1000,
    "code": "N81013",
    "name": "SPRINGWOOD SURGERY",
    "addr_1": "SPRINGWOOD SURGERY",
    "addr_2": "RECTORY LANE",
    "borough": "GUISBOROUGH",
    "village": null,
    "post_code": "TS14 7DJ"
  },
  "A81001": {
    "bnf_code": "0101021B0AAAHAH",
    "items": 1,
    "practice": "A81001",
    "bnf_name": "Alginate_Raft-Forming Oral Susp S/F",
    "nic": 1.95,
    "act_cost": 1.82,
    "quantity": 500,
    "code": "A81001",
    "name": "THE DENSHAM SURGERY",
    "addr_1": "THE HEALTH CENTRE",
    "addr_2": "LAWSON STREET",
    "borough": "STOCKTON ON TEES",
    "village": "CLEVELAND",
    "post_code": "TS18 1HU"
  },
  "A81002": {
    "bnf_code": "0101021B0AAALAL",
    "items": 12,
    "practice": "A81002",
    "bnf_name": "Sod Algin/Pot Bicarb_Susp S/F",
    "nic": 64.51,
    "act_cost": 59.95,
    "quantity": 6300,
    "code": "A81002",
    "name": "QUEENS PARK MEDICAL CENTRE",
    "addr_1": "QUEENS PARK MEDICAL CTR",
    "addr_2": "FARRER STREET",
    "borough": "STOCKTON ON TEES",
    "village": "CLEVELAND",
    "post_code": "TS18 2AW"
  },
  "A81004": {
    "bnf_code": "0101021B0AAAPAP",
    "items": 3,
    "practice": "A81004",
    "bnf_name": "Sod Alginate/Pot Bicarb_Tab Chble 500mg",
    "nic": 9.21,
    "act_cost": 8.55,
    "quantity": 180,
    "code": "A81004",
    "name": "WOODLANDS ROAD SURGERY",
    "addr_1": "6 WOODLANDS ROAD",
    "addr_2": null,
    "borough": "MIDDLESBROUGH",
    "village": "CLEVELAND",
    "post_code": "TS1 3BE"
  },
  "A81003": {
    "bnf_code": "0101021B0BEADAJ",
    "items": 6,
    "practice": "A81003",
    "bnf_name": "Gaviscon Infant_Sach 2g (Dual Pack) S/F",
    "nic": 28.92,
    "act_cost": 26.84,
    "quantity": 90,
    "code": "A81003",
    "name": "VICTORIA MEDICAL PRACTICE",
    "addr_1": "THE HEALTH CENTRE",
    "addr_2": "VICTORIA ROAD",
    "borough": "HARTLEPOOL",
    "village": "CLEVELAND",
    "post_code": "TS26 8DB"
  }
}

In general, dictionaries are well-suited for grouping operations, since keys are unique. A more optimized operation might be to zip the two lists together, since you'll be doing an update:

from itertools import zip_longest
from collections import defaultdict

groups = defaultdict(dict)


def group_item(a, b):
    a_key, b_key = a['practice'] if a else None, b['code'] if b else None
    return a_key, b_key

for a, b in zip_longest(list1, list2):
    ak, bk = group_item(a, b)
    if ak:
        groups[ak].update(a)
    if bk:
        groups[bk].update(b)

# sort list of groups.values() now
list(groups.values())
[{'bnf_code': '0101010G0AAABAB', 'items': 2, 'practice': 'N81013', 'bnf_name': 'Co-Magaldrox_Susp 195mg/220mg/5ml S/F', 'nic': 5.98, 'act_cost': 5.56, 'quantity': 1000, 'code': 'N81013', 'name': 'SPRINGWOOD SURGERY', 'addr_1': 'SPRINGWOOD SURGERY', 'addr_2': 'RECTORY LANE', 'borough': 'GUISBOROUGH', 'village': None, 'post_code': 'TS14 7DJ'}, {'code': 'A81001', 'name': 'THE DENSHAM SURGERY', 'addr_1': 'THE HEALTH CENTRE', 'addr_2': 'LAWSON STREET', 'borough': 'STOCKTON ON TEES', 'village': 'CLEVELAND', 'post_code': 'TS18 1HU', 'bnf_code': '0101021B0AAAHAH', 'items': 1, 'practice': 'A81001', 'bnf_name': 'Alginate_Raft-Forming Oral Susp S/F', 'nic': 1.95, 'act_cost': 1.82, 'quantity': 500}, {'code': 'A81002', 'name': 'QUEENS PARK MEDICAL CENTRE', 'addr_1': 'QUEENS PARK MEDICAL CTR', 'addr_2': 'FARRER STREET', 'borough': 'STOCKTON ON TEES', 'village': 'CLEVELAND', 'post_code': 'TS18 2AW', 'bnf_code': '0101021B0AAALAL', 'items': 12, 'practice': 'A81002', 'bnf_name': 'Sod Algin/Pot Bicarb_Susp S/F', 'nic': 64.51, 'act_cost': 59.95, 'quantity': 6300}, {'code': 'A81003', 'name': 'VICTORIA MEDICAL PRACTICE', 'addr_1': 'THE HEALTH CENTRE', 'addr_2': 'VICTORIA ROAD', 'borough': 'HARTLEPOOL', 'village': 'CLEVELAND', 'post_code': 'TS26 8DB', 'bnf_code': '0101021B0BEADAJ', 'items': 6, 'practice': 'A81003', 'bnf_name': 'Gaviscon Infant_Sach 2g (Dual Pack) S/F', 'nic': 28.92, 'act_cost': 26.84, 'quantity': 90}, {'bnf_code': '0101021B0AAAPAP', 'items': 3, 'practice': 'A81004', 'bnf_name': 'Sod Alginate/Pot Bicarb_Tab Chble 500mg', 'nic': 9.21, 'act_cost': 8.55, 'quantity': 180, 'code': 'A81004', 'name': 'WOODLANDS ROAD SURGERY', 'addr_1': '6 WOODLANDS ROAD', 'addr_2': None, 'borough': 'MIDDLESBROUGH', 'village': 'CLEVELAND', 'post_code': 'TS1 3BE'}]

I'm using zip_longest here, in case your list1 and list2 are of unequal lengths, then the loop doesn't get truncated early due to the size discrepancy. To sort by post_code, do the same as you've done before:

x = sorted(groups.values(), key=operator.itemgetter('post_code'))

Though, this implies the existence of the key. For a more generic approach, a lambda might be better and using get with a default return:

x = sorted(groups.values(), key=lambda x: x.get('post_code', ' '))


C.Nivs
  • 12,353
  • 2
  • 19
  • 44
  • Thank you very much. How do I remove the "dict_values" in the output. I am getting errors when I try to manipulate the output – IOI Aug 21 '19 at 12:29
  • TypeError: 'dict_values' object is not callable – IOI Aug 21 '19 at 15:45
  • Are you trying to call that anywhere? You should be able to use `sorted(groups.values(), key=)`. What are you doing on the line that throws that error? – C.Nivs Aug 21 '19 at 16:03