Remove duplicates from array and maintain order - O(n) time and return same reference

Question

['a', 'c','a','b','b'] -> ['a', 'c', 'b']

Searching online didn't help. I'm not sure whether this is even possible.

Preferably write solution in Python, but anything is fine.

=====

Edit: I originally said O(1) space, but I don't think it's possible.

Changing the requirements:

No space requirements, but you must modify the input in-place and return it.

You're going to have to show us what you've tried. Stackoverflow isn't just a function that you put your problem into and it spits out an answer. — lintmouse, Oct 08 '15 at 23:52
If the array/data is sorted I think it can be done. (Like the reduce step in a map reduce word count http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/) — viper, Oct 09 '15 at 00:02
@viper yeah a reduce or accumulate call would work with sorted. — Aidan Gomez, Oct 09 '15 at 00:03
Possible duplicate of [How do you remove duplicates from a list in Python whilst preserving order?](http://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-python-whilst-preserving-order) — Aidan Gomez, Oct 09 '15 at 00:03
@AidanGomez: Links to related material are welcome, but this is clearly not a dup. — Karoly Horvath, Oct 09 '15 at 00:04
what if you newly allocate it and then overwrite the memory of the original array(ala `a[:]=new_list`? or do you mean accomplish it __only__ by removal of elements from the array (ie del/pop) — Joran Beasley, Oct 09 '15 at 00:09
@JoranBeasley. Regarding your two questions: 1) I wasn't clear enough, I mean you have to modify the input in-place and return it. Updated the question. 2) `del` from a list is an O(n) operation. I'm guessing you want to call it while linearly traversing the input, making it an O(n^2) algorithm. — onepiece, Oct 09 '15 at 00:17

score 3 · Accepted Answer · answered Oct 09 '15 at 00:25

3

How about this? I think the time complexity and the space complexity are both O(n).

def unique(arr):
    index = 0
    s = set()
    for i in range(len(arr)):
        if arr[i] not in s:
            arr[index] = arr[i]
            index += 1
            s.add(arr[i])
    arr[index:] = []

answered Oct 09 '15 at 00:25

Paul Boddington

37,127
10
65
116

Looks like this satisfies both the I/O and requirements. Pretty simple as well. – onepiece Oct 09 '15 at 00:37

zwol · Answer 2 · 2015-10-09T00:42:50.903

I don't think this is possible in O(n) time and O(1) space. To get O(n) time you have to have a scratchpad (with O(1) insert and lookup, e.g. a set) recording all the elements you've already seen; that requires O(m) space (where m is the number of unique elements). Conversely, to get O(1) space you have to rescan earlier elements in the array for each element, which is O(n²) time.

If you'll accept an O(m) scratchpad, this fulfills the "modify the array in place" requirement and runs in O(n) time.

def remove_duplicates(arr):
    """Remove all duplicated elements from ARR in O(m) space and O(n) time.
       ARR is modified in place and is not returned (like `sort`)."""
    n = len(arr)
    if n <= 1: return
    seen = set((arr[0],))
    i = 1
    j = 1
    while i < n:
        if arr[i] not in seen:
           seen.add(arr[i])
           arr[j] = arr[i]
           j += 1
        i += 1
    del arr[j:i]

This would still be O(n²) time if I deleted elements from the array as I iterated over it, because del arr[i] is itself an O(n) operation: it has to slide everything after i down one. del arr[j:i] where i is past the end of the array should be O(1), but if it really matters, you'd have to check the implementation.

And if you must have O(1) space, here's the O(n²) time algorithm:

def remove_duplicates_quadratic(arr):
    """Remove all duplicated elements from ARR in O(1) space and O(n²) time.
       ARR is modified in place and is not returned (like `sort`)."""
    n = len(arr)
    i = 0
    while i < n:
       for j in range(i):
           if arr[j] == arr[i]:
               del arr[i]
               n -= 1
               break
       else:
           i += 1

I'm doing manual array indexing in these because deleting elements from an array you're iterating over using any of the usual for..in constructs is unsafe. You could use a for..in loop in the first one, but you'd lose parallel structure so it would be less readable.

In case you haven't seen it before, an else clause on a loop is executed only if the loop is not exited via break.

I had to look at this for quite a while to convince my self it works as intended .... that said +1 since it meets the O(1) space requirement ... — Joran Beasley, Oct 09 '15 at 00:04
I think so far this is actually the only solution that meets OP's requirements .. — Joran Beasley, Oct 09 '15 at 00:07
Looks good for O(1) space. Sorry that I initially had allegedly impossible requirements of O(1) space and O(n) time. I chose [Paul's answer](http://stackoverflow.com/a/33028026/1661745) because his last line `arr[index:] = []` looks more succint and I think it's O(1) time, versus O(i - j) for your `del arr[j:i]`. Anyways +1. — onepiece, Oct 09 '15 at 00:50
`del arr[j:i]` *ought* to be O(1) in this case, but I haven't actually read through the code I linked to, to find out whether that's true. — zwol, Oct 09 '15 at 00:53

Joran Beasley · Answer 3 · 2015-10-08T23:59:47.287

0

from collections import OrderedDict

OrderedDict.fromkeys(my_array).keys()

edited Oct 08 '15 at 23:59

answered Oct 08 '15 at 23:54

Joran Beasley

110,522
12
160
179

2

I think it does maintain order but O(n) space. – onepiece Oct 08 '15 at 23:56
2

@JoranBeasley: O(1) space is doable with an in-place algorithm. Kills the time requirement though... – Karoly Horvath Oct 08 '15 at 23:58

LetzerWille · Answer 4 · 2015-10-09T00:02:51.937

0

pip install oset

import oset

ll = oset.oset(['a', 'c','a','b','b'])

print(ll)

OrderedSet(['a', 'c', 'b'])`

edited Oct 09 '15 at 00:02

answered Oct 08 '15 at 23:57

LetzerWille

5,355
4
23
26

`oset` is not in the standard library, please give a link to PyPI or wherever. – zwol Oct 09 '15 at 00:02

Darcinon · Answer 5 · 2015-10-09T00:30:48.277

0

To keep the speed efficient, you need to use O(n) space to record records that already exist. If there are a large amount of duplicates it won't take much space. However, the del operation on a standard array is O(n) because it has to shift all the following elements.

def uniquify(data):
    used = set()
    i = 0
    while i < len(data):
        if data[i] in used:
            del data[i] # or data.pop(i)
        else:
            used.add(data[i])
            i += 1
    return data # not strictly necessary because array is modified in-place

edited Oct 09 '15 at 00:30

answered Oct 09 '15 at 00:13

Darcinon

159
6

1

`del` is O(n) operation, so your algorithm is O(n^2) total. – onepiece Oct 09 '15 at 00:14
Too bad Python doesn't have a built in linked list (that I can find), but the function relies on the input anyway so that is already outside of its control. – Darcinon Oct 09 '15 at 00:26
Never mind, a linked list would mess up the indexing speed. – Darcinon Oct 09 '15 at 00:29

score 0 · Answer 6 · answered Oct 09 '15 at 01:37

O(n) time, O(n) space

uniqueList = []
index = 0
while index < len(myList):
    item = myList[index]
    if item not in uniqueList: uniqueList.append(item)
    else: myList.pop(index)
    index += 1

Note: Do not modify a list you are iterating through like below:

for index, item in enumerate(myListCopy):
    if item not in uniqueList: uniqueList.append(item)
    else: myList.pop(index)

O(n^2) time, O(1) space It certainly is bounded by n^2, but in reality it's less.

index = 0
while index < len(myList):
    subIndex = index + 1
    while subIndex < len(myList):
        if myList[subIndex] == myList[index]: myList.pop(subIndex)
        subIndex += 1
    index += 1

O(n) time, O(1) space If your list is sorted and you don't mind removing the first duplicates.

for item in myList:
    if myList.count(item) > 1: myList.remove(item)

score 0 · Answer 7 · answered Oct 09 '15 at 08:08

0

lst1= ['a', 'c','a','b','b'] #-> ['a', 'c', 'b']
lst2 = []

for l in lst1:
   if l not in lst2:
       lst2.append(l)
 print lst2

answered Oct 09 '15 at 08:08

SuperNova

25,512
7
93
64

Remove duplicates from array and maintain order - O(n) time and return same reference

7 Answers7