The problem is made more difficult because you don't know the key values (unique ids) of the dictionaries in reports
. Since each one consists of only one item, you can use next(iter(dict.values()))
with Python 3 to get the single nested dictionary associated with it—which I called checkout
in the code below to give it a name.
Given that, the approach I would use would be to first create a dictionary that groups the elements in reports
by subject—which then gives you something like this to work with (note: I changed the sample reports
data so the first has more than one with a duplicate 'subject'
):
{
'dupe1': [
{'00T2A00003mDvq9': {'due_date': '4/5/2017', 'subject': 'dupe1'}},
{'00T2A00003mDvq7': {'due_date': '4/3/2017', 'subject': 'dupe1'}},
{'00T2A00003mDvq6': {'due_date': '4/6/2017', 'subject': 'dupe1'}}
],
'dupe2': [
{'00T2A00003mDvq8': {'due_date': '4/7/2017', 'subject': 'dupe2'}}
]
}
The lists of reports associated with each subject can then be sorted by date (using a lambda
based on the same next(iter(dict.values()))
trick), and given the now ordered contents of that, it's easy to update the list and remove any duplicates in accordance to your desires.
from time import strptime
from pprint import pprint
DATE_FMT = '%m/%d/%Y'
reports = [
{'00T2A00003mDvq9': {'subject': 'dupe1', 'due_date': '4/5/2017'}},
{'00T2A00003mDvq8': {'subject': 'dupe2', 'due_date': '4/7/2017'}},
{'00T2A00003mDvq7': {'subject': 'dupe1', 'due_date': '4/3/2017'}},
{'00T2A00003mDvq6': {'subject': 'dupe1', 'due_date': '4/6/2017'}}, # + a third duplicate
]
by_subject = {}
for report in reports:
checkout = next(iter(report.values())) # get single subdictionary in each dictionary
by_subject.setdefault(checkout['subject'], []).append(report)
for records in by_subject.values():
records.sort(key=lambda rpt: strptime(next(iter(rpt.values()))['due_date'], DATE_FMT))
# Update reports list in-place.
del reports[:]
for subject, records in by_subject.items():
reports.append(records[0]) # only keep oldest (deletes all newer than first)
print('Deduped reports:')
pprint(reports)
Output:
Deduped reports:
[{'00T2A00003mDvq7': {'due_date': '4/3/2017', 'subject': 'dupe1'}},
{'00T2A00003mDvq8': {'due_date': '4/7/2017', 'subject': 'dupe2'}}]