-3

How to find the duplicates in a list without creating any other list?

Example

A = [1,2,1,3,4,5,4]

At the end

A = [1,4]
Paul
  • 26,170
  • 12
  • 85
  • 119
nname
  • 59
  • 1
  • 9

5 Answers5

2

So you want a function which takes a list, A, and mutates that list to contain only those elements which were originally duplicated? I assume the restriction on creating new lists applies to any new collection. It is best to be as clear as possible of the requirements when asking a question about algorithms.

It seems an odd requirement that no other collections be made in this algorithm, but it is possible. A simple but inefficient solution would be to approach it like this:

  • for each element, x
    • set a boolean flag value, for example, hasDuplicates to false
    • for each element right of x, y
      • if y is a duplicate of x, remove it and set hasDuplicates to true
    • if hasDuplicates is false, remove x

If the restriction of not creating another collection can be relaxed, or if the result of the algorithm can be a new list rather than the old one modified, you will find much more (time) efficient ways of doing this.

Oly
  • 2,329
  • 1
  • 18
  • 23
  • The psuedo algorithm I gave has quadratic runtime (`O(n^2)`). If elements are comparable and more mutation is permissible, you could sort in place (`O(n log n)`) and then scan to find and keep only the originally-duplicated elements. If the list is link-based, the scan is linear (`O(n)`). If it's array-based, you'd need to maintain a 'result tail' index and copy the selected 'to keep' elements to this tail position during the scan, before truncating, for `O(n)` (overall `O(n log n)`). – Oly Aug 16 '22 at 10:21
1

I would go with checking, for each element, if it appears before it but not after. If it doesn't fit, then either it is not a duplicate or it is an other occurence of the duplicate that you don't want to keep. Either cases we don't keep it.

def simplify(a_list):
    for i in range(len(a_list) - 1, -1, -1):
        value = a_list[i]
        if not value in a_list[:i] or value in a_list[i+1:]:
            del a_list[i]

Not sure if using slices fit your requirements though.


Usage:

>>> A = [1,2,1,3,4,5,4]
>>> simplify(A)
>>> A
[1, 4]
>>> A = [1,1,1,1,1,2,2,2,2]
>>> simplify(A)
>>> A
[1, 2]
>>> A = [1,1,1,1,1]
>>> simplify(A)
>>> A
[1]
>>> A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> simplify(A)
>>> A
[]
301_Moved_Permanently
  • 4,007
  • 14
  • 28
0

You can use set to get only the unique values and then remove them, one-by-one, from the original list - so that only the duplicates will remain:

a = [1,2,1,3,4,5,4]
s = list(set(a))
for x in s:
    a.remove(x)
print a # [1, 4]

Another elegant option which I 'stole' from Ritesh Kumar is:
collect only the items that appear more than once, use set to remove the dups, and wrap it with list to return a list as a result:

a = [1,2,1,3,4,5,4]
print list(set([x for x in a if a.count(x) > 1])) # [1, 4]
Community
  • 1
  • 1
Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
0

This should do what you need, barring clarification:

def find_duplicated_items(data):
    seen = set()
    duplicated = set()

    for x in data:
        if x in seen:
            duplicated.add(x)
        else:
            seen.add(x)

    return duplicated

It takes an iterable and returns a set; you can turn it into a list with list(results).

UPDATE:

Here's another way of doing it, as a generator. Just because :).

from collections import Counter

def find_duplicated(iterable, atleast=2):
    counter = Counter()
    yielded = set()

    for item in iterable:
        counter[item] += 1
        if (counter[item] >= atleast) and (item not in yielded):
            yield item
            yielded.add(item)
Cyphase
  • 11,502
  • 2
  • 31
  • 32
  • Though the question only specifies no new 'list' be created, I would assume the requirement is that no new collection of any kind be made. Also it appears the requirement is for a method which mutates an existing list rather than creating a new one. This is exactly the right solution otherwise I think! – Oly Aug 08 '15 at 21:56
  • @ Oly 'Oil' Sourbut - Can you please explain the difference b/w creating a collection and creating a different list? – nname Aug 08 '15 at 22:08
  • @Nidhi A ['collection'](http://en.wikipedia.org/wiki/Collection_(abstract_data_type)) in computer science refers to a number of related data structures. A [list](http://en.wikipedia.org/wiki/List_(abstract_data_type)) is a kind of collection, representing a some items which have an order, and can usually be accessed by index (e.g. `a` is a list, I want `a[4]`, the 5th - counting from 0 - item in `a`). In my experience, a [set](http://en.wikipedia.org/wiki/Set_(abstract_data_type)) is the next most common kind of collection - it stores data with no particular order, and no repeats. – Oly Aug 09 '15 at 11:26
  • @Nidhi in many programming languages, the type system works like this: a 'collection' is a type which has methods to check if an item is contained in it, and can be iterated over. A 'list' is a collection, and also has methods to access items by index, and (if mutable) insert or remove items at a given index. A 'set' is a collection, and also has the [invariant](https://en.wikipedia.org/wiki/Invariant_(computer_science)) property that it contains no repeats. Often these types are specialised further - for example 'arraylist' or 'linkedlist' types might specialise the 'list' type. – Oly Aug 09 '15 at 11:31
  • 1
    @Nidhi, lists are good if you need order; sets are useful for making sure you only store one of each thing, or for membership testing; you can check if something is in a set much more efficiently than if something is in a list. There are also multisets, also called bags, of which `Counter` is an implementation; it can store a count for each item in a set. – Cyphase Aug 09 '15 at 11:35
0

This code appears to delete 2nd duplicates and non duplicates in place, yielding the old list containing only unique duplicates. I haven't thoroughly tested it. Note that time required will scale as O(N**2) where N is the length of the input list.

Unlike other solutions, there are no new lists constructed here, not even a list for a for loop or list comprehension.

File: "dup.py"

def dups(mylist):
    idx = 0 
    while(idx<len(mylist)):
        delidx = idx+1
        ndeleted = 0
        while delidx < len(mylist):
            if mylist[delidx] == mylist[idx]:
                del mylist[delidx]
                ndeleted += 1
            else:
                delidx += 1
        if ndeleted==0:
            del mylist[idx]
        else:
            idx += 1
    return mylist

Usage (iPython)

In [1]: from dup import dups

In [2]: dups([1,1,1,1,1])
Out[2]: [1]

In [3]: dups([1,1,2,1,1])
Out[3]: [1]

In [4]: dups([1,1,2,2,1])
Out[4]: [1, 2]

In [5]: dups([1,1,2,1,2])
Out[5]: [1, 2]

In [6]: dups([1,2,3,1,2])
Out[6]: [1, 2]

In [7]: dups([1,2,1,3,4,5,4])
Out[7]: [1, 4]
Paul
  • 26,170
  • 12
  • 85
  • 119