0

Thank you to all who help here.

I have a list of lists. Those lists contain dictionaries like so:

combined lists = [
        [
            {'COMPANY': 'company1', 'NUMBER': '111', 'SHIPMENTS': ['1', '2', '3', '4']},
            {'COMPANY': 'company2', 'NUMBER': '222', 'SHIPMENTS': ['1']},
            {'COMPANY': 'company3', 'NUMBER': '333', 'SHIPMENTS': ['1', '4']},
            {'COMPANY': 'company4', 'NUMBER': '444', 'SHIPMENTS': ['2', '5']},
            {'COMPANY': 'company5', 'NUMBER': '555', 'SHIPMENTS': ['1', '3', '5', '9']}
        ], 
        [
            {'COMPANY': 'company1', 'NUMBER': '111', 'SHIPMENTS': ['5', '6', '7', '8']},
            {'COMPANY': 'company3', 'NUMBER': '333', 'SHIPMENTS': ['3', '5']},
            {'COMPANY': 'company5', 'NUMBER': '555', 'SHIPMENTS': ['3', '5', '7']},
            {'COMPANY': 'company7', 'NUMBER': '777', 'SHIPMENTS': ['2', '4']},
            {'COMPANY': 'company9', 'NUMBER': '999', 'SHIPMENTS': ['1', '2', '5', '6', '7']}
        ], 
    ]

I to combine these lists based on the COMPANY and SHIPMENTS, and I'd like to not have duplicate SHIPMENTS values. The NUMBER key/value is irrelevant.

Final output would ideally be a list of dictionaries that looks something like this, where the shipments are all combined for the company:

final_list = [
        {'COMPANY': 'company1', 'SHIPMENTS': ['1', '2', '3', '4', '5', '6', '7', '8']},
        {'COMPANY': 'company2', 'SHIPMENTS': ['1']},
        {'COMPANY': 'company3', 'SHIPMENTS': ['1', '4', '3', '5']},
        {'COMPANY': 'company4', 'SHIPMENTS': ['2', '5']},
        {'COMPANY': 'company5', 'SHIPMENTS': ['1', '3', '5', '7', '9']},
        {'COMPANY': 'company7', 'SHIPMENTS': ['2', '4']},
        {'COMPANY': 'company9', 'SHIPMENTS': ['1', '2', '5', '6', '7']}
    ]

I know I haven't offered anything I've tried, but mainly looking for how to approach getting to the final output. I'm using python3.6 if that matters

Grismar
  • 27,561
  • 4
  • 31
  • 54
dkeeper09
  • 537
  • 3
  • 11
  • 29
  • Does this answer your question? [How do I merge two dictionaries in a single expression in Python?](https://stackoverflow.com/questions/38987/how-do-i-merge-two-dictionaries-in-a-single-expression-in-python) – Grismar May 13 '20 at 01:14
  • Initially voted to close, but I'm assuming you're really asking: how do I merge the lists of matching records when merging the two dictionaries? Also, is the data structure something you get from somewhere else, or was it your choice to create a list of dictionaries - instead of dictionary of dictionaries with the company name as the key? – Grismar May 13 '20 at 01:15
  • Yea that's probably a better way to ask it. Is it as simple as merging two dictionaries? – dkeeper09 May 13 '20 at 01:18

3 Answers3

1

Here's a solution, this uses sets to ensure there are no duplicates, but it will lose the order of shipments.

from itertools import chain

combined_lists = [
    [
        {'COMPANY': 'company1', 'NUMBER': '111', 'SHIPMENTS': ['1', '2', '3', '4']},
        {'COMPANY': 'company2', 'NUMBER': '222', 'SHIPMENTS': ['1']},
        {'COMPANY': 'company3', 'NUMBER': '333', 'SHIPMENTS': ['1', '4']},
        {'COMPANY': 'company4', 'NUMBER': '444', 'SHIPMENTS': ['2', '5']},
        {'COMPANY': 'company5', 'NUMBER': '555', 'SHIPMENTS': ['1', '3', '5', '9']}
    ],
    [
        {'COMPANY': 'company1', 'NUMBER': '111', 'SHIPMENTS': ['5', '6', '7', '8']},
        {'COMPANY': 'company3', 'NUMBER': '333', 'SHIPMENTS': ['3', '5']},
        {'COMPANY': 'company5', 'NUMBER': '555', 'SHIPMENTS': ['3', '5', '7']},
        {'COMPANY': 'company7', 'NUMBER': '777', 'SHIPMENTS': ['2', '4']},
        {'COMPANY': 'company9', 'NUMBER': '999', 'SHIPMENTS': ['1', '2', '5', '6', '7']}
    ]
]

COMPANY_KEY = 'COMPANY'
SHIPMENTS_KEY = 'SHIPMENTS'

# you're looking to:
# - combine the lists
# - drop the number
# - combine the shipments, removing duplicates
final_dict = {}
for d in chain.from_iterable(combined_lists):
    key = d[COMPANY_KEY]
    if key in final_dict:
        final_dict[key][SHIPMENTS_KEY].update(*d[SHIPMENTS_KEY])
    else:
        final_dict[key] = {SHIPMENTS_KEY: set(d[SHIPMENTS_KEY])}
print(final_dict)

# if you need a list, not a dict
final_list = [{COMPANY_KEY: key, SHIPMENTS_KEY: value} for key, value in final_dict.items()]
print(final_list)

Note that, if all you need is a list of shipments and that's really the only thing in your dictionaries, an even simpler solution would be this:

from collections import defaultdict

better_dict = defaultdict(set)
for d in chain.from_iterable(combined_lists):
    better_dict[d[COMPANY_KEY]].update(*d[SHIPMENTS_KEY])
print(better_dict)
Grismar
  • 27,561
  • 4
  • 31
  • 54
  • @r.ook - I know, please read the entire solution; I think using a dict as an intermediate still is a clean solution even if the end goal is a list. – Grismar May 13 '20 at 01:37
  • For an ordered set, this is an OK recipe https://code.activestate.com/recipes/528878/ or you can just construct a list, using a set to keep track. – Grismar May 13 '20 at 01:43
0

I think this should solve your issue

import collections

merged = collections.defaultdict(list)

for x in combined_lists:
    for y in x:
        merged[y["COMPANY"]] += y["SHIPMENT"]
final_list = []
for x in merged:
    final_list.append({"COMPANY": x, "SHIPMENT": merged[x]})

Dhruv Agarwal
  • 558
  • 6
  • 15
  • Your code doesn't actually work, with the `combined_lists` as provided by OP. It also disregards the need for no duplicates in shipments? – Grismar May 13 '20 at 01:38
0

This solves the problem, try it and play around with it, you can optimize the code for better performance.

def company_exists(company, resulting_list):
    for i,dict_ in enumerate(resulting_list):
        if company == dict_['COMPANY']:
            return i, True
    return None, False


def merge_lists(combined_lists):
    res = []

    for list_ in combined_lists:
        for dict_ in list_:
            idx, check = company_exists(dict_['COMPANY'], res)
            if not check:
                res.append(dict_)
            else:
                res[idx]['SHIPMENTS'].extend(dict_['SHIPMENTS'])
                res[idx]['SHIPMENTS'] = list(set(res[idx]['SHIPMENTS']))

    return res

Hope, it helpes.

null
  • 1,944
  • 1
  • 14
  • 24