Deduplication algorithm logic

Question

could you please help me find the error in my logic? The function doesn't work well and I can't figure out why. The instructions are: Write a function remove_duplicates that takes in a list and removes elements of the list that are the same. Do not modify the list you take as input! Instead, return a new list. For example: remove_duplicates([1,1,2,2]) should return [1,2].

For [1, 2, 2, 5, 5, 5, 7, 7] I'm getting this output [1, 2, 5, 7], which is good. However, for [4, 9, 9, 4] the output is [9, 9, 4], which is wrong. I can't find out what the problem is. I started learning programming a few weeks ago, so I'm a novice. Thanks!

My code:

def remove_duplicates(l):
    nl = list(l)
    i = 0    
    while i <= len(nl)-2:
        j = i + 1
        while j <= len(nl)-1:
            if nl[i] == nl[j]:
                nl.remove(nl[j])
            else:
                j += 1
        i += 1
    return nl

score 2 · Answer 1 · answered Dec 14 '14 at 14:59

You want to utilize a set. A set will remove duplicates

def remove_duplicates(l):
    return list(set(l))


l_1 = [1,1,2,2]
l_2 = remove_duplicates(l_1)

print l_1
print l_2

Outputs:

[1, 1, 2, 2]
[1, 2]

Alternatively, with your other list:

[4, 9, 9, 4]
[9, 4]

Notice that the function wraps set in a list, otherwise you would get a set back, instead of a new list.

Hackaholic · Answer 2 · 2014-12-14T15:17:11.840

2

in python we have set to remove duplication:

>>> a = [1, 2, 2, 5, 5, 5, 7, 7]
>>> set(a)
set([1, 2, 5, 7])

in your code if you backtrack:

0 i          # here i is 0 
1 j          # here j is 0
4 duplicate element   first duplicate element found at last that is 4, but removed from front
1 i           # now list is [9,9,4]  but i is 1 and j is 2
2 j
[9, 9, 4]

so there is not match for 9 and 9 , so it not been removed

so in your code if you put del(nl[j]) , it will work fine.

edited Dec 14 '14 at 15:17

answered Dec 14 '14 at 15:01

Hackaholic

19,069
5
54
72

This is not incorrect, but it should be noted that this returns a `set` not a `list`, like the questioner asked for. – Andy Dec 14 '14 at 15:12

score 0 · Answer 3 · answered Dec 14 '14 at 14:58

0

You might look into deduplication via converting the list object into a set.

my_list = [4, 9, 9, 4]
deduped = list(set(my_list))
print deduped  # prints [9, 4]

answered Dec 14 '14 at 14:58

rchang

5,150
1
15
25

Hugh Bothwell · Accepted Answer · 2014-12-14T15:42:05.470

Your main problem is the line

nl.remove(nl[j])

You want to remove the item at index j. What this line actually does is remove the first occurrence of the value contained at index j.

Instead, try

del nl[j]

Edit:

Let's trace your example, remove_duplicates([4,9,9,4]):

nl = [4, 9, 9, 4]
i = 0

j = 1
nl[0] != nl[1]

j = 2
nl[0] != nl[2]

j = 3
nl[0] == nl[3]

At this point, you want to get rid of nl[3] by calling

nl.remove(nl[3])

but see what happens:

>>> [4, 9, 9, 4].remove(4)      # you expect [4, 9, 9, {deleted}]
[9, 9, 4]                       # but get    [{deleted}, 9, 9, 4]

which causes further issues by shifting the array - i and j no longer point at the same items.

The cause is simple:

>>> help(list.remove)
L.remove(value) -> None -- remove first occurrence of value.
                                    ^^                 ^^

you are telling it to remove a value, not a location.

If instead you do

nl = [4, 9, 9, 4]
del nl[3]          # delete a *location*, not a *value*     
                   # gives [4, 9, 9, {deleted}]

you get the result you were expecting.

This helped, but don't understand why. Could you elaborate a bit more on this? — Peter, Dec 14 '14 at 15:18

score 0 · Answer 5 · answered Dec 14 '14 at 15:06

For [4, 9, 9, 4] if you want [4, 9] to be the output, other answers that just use list(set(a)) will not preserve the order.

To preserve the order use:

def remove_duplicates(l):
    s = set()
    o = []
    for i in l:
        if i not in s:
            s.add(i)
            o.append(i)
    return o

See that:

>>> remove_duplicates([4, 9, 9, 4])
[4, 9]

Deduplication algorithm logic

5 Answers5