How to find the duplicates in a list without creating any other list?
Example
A = [1,2,1,3,4,5,4]
At the end
A = [1,4]
How to find the duplicates in a list without creating any other list?
Example
A = [1,2,1,3,4,5,4]
At the end
A = [1,4]
So you want a function which takes a list, A, and mutates that list to contain only those elements which were originally duplicated? I assume the restriction on creating new lists applies to any new collection. It is best to be as clear as possible of the requirements when asking a question about algorithms.
It seems an odd requirement that no other collections be made in this algorithm, but it is possible. A simple but inefficient solution would be to approach it like this:
x
hasDuplicates
to false
x
, y
y
is a duplicate of x
, remove it and set hasDuplicates
to true
hasDuplicates
is false, remove x
If the restriction of not creating another collection can be relaxed, or if the result of the algorithm can be a new list rather than the old one modified, you will find much more (time) efficient ways of doing this.
I would go with checking, for each element, if it appears before it but not after. If it doesn't fit, then either it is not a duplicate or it is an other occurence of the duplicate that you don't want to keep. Either cases we don't keep it.
def simplify(a_list):
for i in range(len(a_list) - 1, -1, -1):
value = a_list[i]
if not value in a_list[:i] or value in a_list[i+1:]:
del a_list[i]
Not sure if using slices fit your requirements though.
Usage:
>>> A = [1,2,1,3,4,5,4]
>>> simplify(A)
>>> A
[1, 4]
>>> A = [1,1,1,1,1,2,2,2,2]
>>> simplify(A)
>>> A
[1, 2]
>>> A = [1,1,1,1,1]
>>> simplify(A)
>>> A
[1]
>>> A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> simplify(A)
>>> A
[]
You can use set
to get only the unique values and then remove them, one-by-one, from the original list - so that only the duplicates will remain:
a = [1,2,1,3,4,5,4]
s = list(set(a))
for x in s:
a.remove(x)
print a # [1, 4]
Another elegant option which I 'stole' from Ritesh Kumar is:
collect only the items that appear more than once, use set to remove the dups, and wrap it with list
to return a list as a result:
a = [1,2,1,3,4,5,4]
print list(set([x for x in a if a.count(x) > 1])) # [1, 4]
This should do what you need, barring clarification:
def find_duplicated_items(data):
seen = set()
duplicated = set()
for x in data:
if x in seen:
duplicated.add(x)
else:
seen.add(x)
return duplicated
It takes an iterable and returns a set; you can turn it into a list with list(results)
.
UPDATE:
Here's another way of doing it, as a generator. Just because :).
from collections import Counter
def find_duplicated(iterable, atleast=2):
counter = Counter()
yielded = set()
for item in iterable:
counter[item] += 1
if (counter[item] >= atleast) and (item not in yielded):
yield item
yielded.add(item)
This code appears to delete 2nd duplicates and non duplicates in place, yielding the old list containing only unique duplicates. I haven't thoroughly tested it. Note that time required will scale as O(N**2) where N is the length of the input list.
Unlike other solutions, there are no new lists constructed here, not even a list for a for loop or list comprehension.
File: "dup.py"
def dups(mylist):
idx = 0
while(idx<len(mylist)):
delidx = idx+1
ndeleted = 0
while delidx < len(mylist):
if mylist[delidx] == mylist[idx]:
del mylist[delidx]
ndeleted += 1
else:
delidx += 1
if ndeleted==0:
del mylist[idx]
else:
idx += 1
return mylist
Usage (iPython)
In [1]: from dup import dups
In [2]: dups([1,1,1,1,1])
Out[2]: [1]
In [3]: dups([1,1,2,1,1])
Out[3]: [1]
In [4]: dups([1,1,2,2,1])
Out[4]: [1, 2]
In [5]: dups([1,1,2,1,2])
Out[5]: [1, 2]
In [6]: dups([1,2,3,1,2])
Out[6]: [1, 2]
In [7]: dups([1,2,1,3,4,5,4])
Out[7]: [1, 4]