2

Python: How to remove all duplicate items from a list

Hey guys

I have a list of (file,inode,image,hash)-tuples. I need to delete BOTH items if they have the same hash. I don't have that much of programming experience, so maybe a hint for what i have to look would already be helpful. I've already searched the Internet, but the only thing i found was this. So far I've come up with this (extremely awkward) solution:

        hashlist = {}
        files_tobe_removed = []
        for (file, inode, image, hash) in self.files_for_json:
            hashlist[hash] = 0
        for (file, inode, image, hash) in self.files_for_json:
            hashlist[hash] +=1
        for (k,v) in hashlist.iteritems():
            if v == 2:
                files_tobe_removed.append(k)
        for (file,inode,image,hash) in self.files_for_json:
            if hash in files_tobe_removed:
                path = self.outDir + file
                os.remove(path)
                self.files_for_json.remove((file,inode,image,hash))

Any help will be appreciated. Thanks in advance

Community
  • 1
  • 1
stebu92
  • 313
  • 1
  • 2
  • 12

1 Answers1

3
>>> from collections import Counter
>>> L=[1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,1,2,3]
>>> [k for k,v in Counter(L).items() if v==1]
[7, 8, 9]

To clarify:

hash_counter = Counter(x[3] for x in self.files_for_json)
for (file,inode,image,hash) in self.files_for_json:
    if hash_counter[hash]>1:
        # duplicated hash
        ...
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • For a list of tuples it won't look so elegant. :-) – DrTyrsa Nov 07 '11 at 09:27
  • @DrTyrsa, why would you use a list of tuples? Surely you'd just use a list of the hashes. – John La Rooy Nov 07 '11 at 09:29
  • So you need one iteration to construct a list of tuples. Second iteration to count them with Counter. And third iteration to remove the elements. While it can be solved in two iterations. – DrTyrsa Nov 07 '11 at 09:35
  • @DrTyrsa, I first used 3 iterations to make it fit the OP code better. Now I changed it to use just 2 iterations – John La Rooy Nov 07 '11 at 09:38