How to find duplicates in a list without creating a separate list?

Question

How to find the duplicates in a list without creating any other list?

Example

A = [1,2,1,3,4,5,4]

At the end

A = [1,4]

If there are three `4`'s in the original list, do you want the result to have two `4`'s or one? — Cyphase, Aug 08 '15 at 21:48
Duplicate: http://stackoverflow.com/questions/9835762/find-and-list-duplicates-in-python-list — Alfie, Aug 08 '15 at 21:51
1)I dont care about order. 2) Yes am taking about removing duplicates.3) If there are three 4's in the original list, i want result to have one 4 — nname, Aug 08 '15 at 21:52
Why don't you want to create another list? What about a set or other container? — Cyphase, Aug 08 '15 at 21:57
@Nidhi, did one of the answers solve your problem? If so, please accept it :). — Cyphase, Aug 10 '15 at 06:51

score 2 · Answer 1 · answered Aug 08 '15 at 21:53

So you want a function which takes a list, A, and mutates that list to contain only those elements which were originally duplicated? I assume the restriction on creating new lists applies to any new collection. It is best to be as clear as possible of the requirements when asking a question about algorithms.

It seems an odd requirement that no other collections be made in this algorithm, but it is possible. A simple but inefficient solution would be to approach it like this:

for each element, x
- set a boolean flag value, for example, hasDuplicates to false
- for each element right of x, y
  - if y is a duplicate of x, remove it and set hasDuplicates to true
- if hasDuplicates is false, remove x

If the restriction of not creating another collection can be relaxed, or if the result of the algorithm can be a new list rather than the old one modified, you will find much more (time) efficient ways of doing this.

The psuedo algorithm I gave has quadratic runtime (`O(n^2)`). If elements are comparable and more mutation is permissible, you could sort in place (`O(n log n)`) and then scan to find and keep only the originally-duplicated elements. If the list is link-based, the scan is linear (`O(n)`). If it's array-based, you'd need to maintain a 'result tail' index and copy the selected 'to keep' elements to this tail position during the scan, before truncating, for `O(n)` (overall `O(n log n)`). — Oly, Aug 16 '22 at 10:21

score 1 · Answer 2 · answered Aug 08 '15 at 23:15

1

I would go with checking, for each element, if it appears before it but not after. If it doesn't fit, then either it is not a duplicate or it is an other occurence of the duplicate that you don't want to keep. Either cases we don't keep it.

def simplify(a_list):
    for i in range(len(a_list) - 1, -1, -1):
        value = a_list[i]
        if not value in a_list[:i] or value in a_list[i+1:]:
            del a_list[i]

Not sure if using slices fit your requirements though.

Usage:

>>> A = [1,2,1,3,4,5,4]
>>> simplify(A)
>>> A
[1, 4]
>>> A = [1,1,1,1,1,2,2,2,2]
>>> simplify(A)
>>> A
[1, 2]
>>> A = [1,1,1,1,1]
>>> simplify(A)
>>> A
[1]
>>> A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> simplify(A)
>>> A
[]

answered Aug 08 '15 at 23:15

301_Moved_Permanently

4,007
14
28

In python 2 I believe `range(...)` creates a list. – Paul Aug 10 '15 at 04:41
But `xrange` doesn't. – 301_Moved_Permanently Aug 10 '15 at 06:29
But slicing does. – trincot Aug 16 '22 at 09:48

score 0 · Answer 3 · edited May 23 '17 at 12:06

You can use set to get only the unique values and then remove them, one-by-one, from the original list - so that only the duplicates will remain:

a = [1,2,1,3,4,5,4]
s = list(set(a))
for x in s:
    a.remove(x)
print a # [1, 4]

Another elegant option which I 'stole' from Ritesh Kumar is:
collect only the items that appear more than once, use set to remove the dups, and wrap it with list to return a list as a result:

a = [1,2,1,3,4,5,4]
print list(set([x for x in a if a.count(x) > 1])) # [1, 4]

Cyphase · Answer 4 · 2015-08-08T22:14:07.253

0

This should do what you need, barring clarification:

def find_duplicated_items(data):
    seen = set()
    duplicated = set()

    for x in data:
        if x in seen:
            duplicated.add(x)
        else:
            seen.add(x)

    return duplicated

It takes an iterable and returns a set; you can turn it into a list with list(results).

UPDATE:

Here's another way of doing it, as a generator. Just because :).

from collections import Counter

def find_duplicated(iterable, atleast=2):
    counter = Counter()
    yielded = set()

    for item in iterable:
        counter[item] += 1
        if (counter[item] >= atleast) and (item not in yielded):
            yield item
            yielded.add(item)

edited Aug 08 '15 at 22:14

answered Aug 08 '15 at 21:54

Cyphase

11,502
2
31
32

Though the question only specifies no new 'list' be created, I would assume the requirement is that no new collection of any kind be made. Also it appears the requirement is for a method which mutates an existing list rather than creating a new one. This is exactly the right solution otherwise I think! – Oly Aug 08 '15 at 21:56
@ Oly 'Oil' Sourbut - Can you please explain the difference b/w creating a collection and creating a different list? – nname Aug 08 '15 at 22:08
@Nidhi A ['collection'](http://en.wikipedia.org/wiki/Collection_(abstract_data_type)) in computer science refers to a number of related data structures. A [list](http://en.wikipedia.org/wiki/List_(abstract_data_type)) is a kind of collection, representing a some items which have an order, and can usually be accessed by index (e.g. `a` is a list, I want `a[4]`, the 5th - counting from 0 - item in `a`). In my experience, a [set](http://en.wikipedia.org/wiki/Set_(abstract_data_type)) is the next most common kind of collection - it stores data with no particular order, and no repeats. – Oly Aug 09 '15 at 11:26
@Nidhi in many programming languages, the type system works like this: a 'collection' is a type which has methods to check if an item is contained in it, and can be iterated over. A 'list' is a collection, and also has methods to access items by index, and (if mutable) insert or remove items at a given index. A 'set' is a collection, and also has the [invariant](https://en.wikipedia.org/wiki/Invariant_(computer_science)) property that it contains no repeats. Often these types are specialised further - for example 'arraylist' or 'linkedlist' types might specialise the 'list' type. – Oly Aug 09 '15 at 11:31
1

@Nidhi, lists are good if you need order; sets are useful for making sure you only store one of each thing, or for membership testing; you can check if something is in a set much more efficiently than if something is in a list. There are also multisets, also called bags, of which `Counter` is an implementation; it can store a count for each item in a set. – Cyphase Aug 09 '15 at 11:35

score 0 · Answer 5 · answered Aug 10 '15 at 04:42

This code appears to delete 2nd duplicates and non duplicates in place, yielding the old list containing only unique duplicates. I haven't thoroughly tested it. Note that time required will scale as O(N**2) where N is the length of the input list.

Unlike other solutions, there are no new lists constructed here, not even a list for a for loop or list comprehension.

File: "dup.py"

def dups(mylist):
    idx = 0 
    while(idx<len(mylist)):
        delidx = idx+1
        ndeleted = 0
        while delidx < len(mylist):
            if mylist[delidx] == mylist[idx]:
                del mylist[delidx]
                ndeleted += 1
            else:
                delidx += 1
        if ndeleted==0:
            del mylist[idx]
        else:
            idx += 1
    return mylist

Usage (iPython)

In [1]: from dup import dups

In [2]: dups([1,1,1,1,1])
Out[2]: [1]

In [3]: dups([1,1,2,1,1])
Out[3]: [1]

In [4]: dups([1,1,2,2,1])
Out[4]: [1, 2]

In [5]: dups([1,1,2,1,2])
Out[5]: [1, 2]

In [6]: dups([1,2,3,1,2])
Out[6]: [1, 2]

In [7]: dups([1,2,1,3,4,5,4])
Out[7]: [1, 4]

How to find duplicates in a list without creating a separate list?

5 Answers5