4

I have a list of dictionaries each of them describing a file (file format, filename, filesize, ... and a full path to the file [always unique]). The goal is to exclude all but one dictionaries describing copies of the same file (I just want a single dict (entry) per file, no matter how many copies there are.

In other words: if 2 (or more) dicts differ only in a single key (i.e. path) - leave only one of them).

For example, here is the source list:

src_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
            {'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/mydir'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]

The result should look like this:

dst_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]
Vasily
  • 2,192
  • 4
  • 22
  • 33
  • Dupe of the marked question. See the accepted answer and use `x['key1'] in seen or seen_add(x['key1'])` to solve your issue – Bhargav Rao Jun 08 '16 at 14:56
  • 1
    Why is this discarded? `{'key1': 'non_unique_value2', 'key2': 'unique_value3'}` – Reblochon Masque Jun 08 '16 at 14:58
  • The tuple of key and value should be added to `seen`, not the value alone. – Suzana Jun 08 '16 at 15:03
  • @Suzana_K, why you think so? – AndreyT Jun 08 '16 at 15:09
  • Because that makes the solution more generic. I assume `key1` is only an example, there could be multiple keys where duplicates should be filtered and more than one could have a specific value. – Suzana Jun 08 '16 at 15:14
  • @BhargavRao, thanks, but in my case this won't help, because I'm dealing with a large set of key:value pairs, and the main goal is: if 2 (or more) dicts differ ONLY in unique key - leave only one of them. I've rewritten the question and added additional explanation. This is clearly not a dupe! – Vasily Jun 09 '16 at 12:07
  • @Vasily, The question has started to become a bit unclear. The dupe still holds for longer dictionaries. Can you provide a [MCVE]? – Bhargav Rao Jun 09 '16 at 12:16
  • @BhargavRao Yep, sure! Done – Vasily Jun 09 '16 at 13:53
  • Thanks and Sorry, Reopened. It's always better to have a complete example at the starting itself. – Bhargav Rao Jun 09 '16 at 14:25
  • 1
    I think you will find answer on your question [Here](http://stackoverflow.com/questions/7090758/python-remove-duplicate-dictionaries-from-a-list) – Omar Hafez Jun 09 '16 at 15:12

2 Answers2

6

Use another dictionary to map the dictionaries from the list without the "ignored" keys to the actual dictionaries. This way, only one of each kind will be retained. Of course, dicts are not hashable, so you have to use (sorted) tuples instead.

src_list = [{'filename': 'abc', 'filetype': '.txt', 'path': 'C:/'},
            {'filename': 'abc', 'filetype': '.txt', 'path': 'C:/mydir'},
            {'filename': 'def', 'filetype': '.zip', 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', 'path': 'C:/mydir2'}]
ignored_keys = ["path"]
filtered = {tuple((k, d[k]) for k in sorted(d) if k not in ignored_keys): d for d in src_list}
dst_lst = list(filtered.values())

Result is:

[{'path': 'C:/mydir', 'filetype': '.txt', 'filename': 'abc'}, 
 {'path': 'C:/mydir2', 'filetype': '.zip', 'filename': 'def'}]
tobias_k
  • 81,265
  • 12
  • 120
  • 179
2

My own solution (maybe not the best, but it worked):

    dst_list = []
    seen_items = set()
    for dictionary in src_list:
        # here we cut the unique key (path) out to add it back later after a duplicate check
        path = dictionary.pop('path', None)
        t = tuple(dictionary.items())
        if t not in seen_items:
            seen_items.add(t)
            # duplicate-check passed, adding the unique key back to it's dictionry
            dictionary['path'] = path
            dst_list.append(dictionary)

    print(dst_list) 

Where

src_list is the original list with possible duplicates,

dst_list is the final duplicate-free list,

path is the unique key

Vasily
  • 2,192
  • 4
  • 22
  • 33