Finding duplicate numbers in a list and making them unique

Question

I'm trying to implement a Genetic Algorithm and need this for my crossover function. What I need to figure out is the following. If I have the list

[0, 1, 6, 7, 5, 4, 3, 2, 1]

it has duplicate elements. I want to end up with the list

[0, 8, 6, 7, 5, 4, 3, 2, 1]

The reason I end up with this list is that I look at each element from left to right. And I see at index 1, there's the number '1' which also exists in the list. Therefore I change it to an element not in the list. The smallest element in the list is 0 and largest element is len(list) -1.

In this case the order of the list is important. Therefore, I do not think converting it to a set would be appropriate. I do not want to lose the order of the elements. I am just changing elements that are repeated in the list already.

Another example would be

[0, 1, 2, 7, 3, 4, 3, 2, 1]

Would become

[0, 5, 6, 7, 8, 4, 3, 2, 1]

So what happened here was I saw the number 1 at index 1, and realized that from the range of 0 to 8, I am missing the number 5. So 1 got replaced with 5. Similarly I did that for index 2, which I replaced with 6. Lastly I did this for index 4, which was originally a 3 but got replaced with an 8 because it's a duplicate.

What I was thinking was incrementing the duplicate number, then checking if its a duplicate, and repeating until every element in the list was unique. However, I can't think of a way to do this.

Possible duplicate of [Get unique values from a list in python](https://stackoverflow.com/questions/12897374/get-unique-values-from-a-list-in-python) — Shawn Mehan, May 25 '17 at 18:57
Will the number of elements in the list always be the same as the max value? What happens if there are multiple dupes? Do you always want to replace the first or the last element? My point is that this is a very poorly specified problem and you should really put more thought into your question before asking. You might come up with an answer on your own if you take the time to spec out the requirements precisely. — Mad Physicist, May 25 '17 at 18:58
The requirements are not properly specified. Please add more detail. — juanpa.arrivillaga, May 25 '17 at 19:00
How big is that list? (i.e. should we think in terms of conserving memory/processing) — zwer, May 25 '17 at 19:10
I added a few more requirements for it and added another example. I hope it is better to understand now. The list would be at max around 100 elements. — Muhammad Rehman, May 25 '17 at 19:12

gold_cy · Accepted Answer · 2017-05-25T19:55:37.980

2

So unless I am mistaken there is no real methodology behind the number you are picking to replace, simply one you see as missing. If that is the case then this produces the desired result.

l = [0, 1, 2, 7, 3, 4, 3, 2, 1]
missing = [n for n in range(len(l)) if n not in set(l)]

for num in l:
    if l.count(num) > 1:
        ix = l.index(num)
        try:
            l[ix] = missing.pop(0)
        except IndexError:
            break

print l
>>[0, 5, 6, 7, 8, 4, 3, 2, 1]

edited May 25 '17 at 19:55

answered May 25 '17 at 19:31

gold_cy

13,648
3
23
45

1

Why are you rebuilding your `missing` list on every loop? Sounds terribly inefficient. Also, reverse the list generation, `list.pop()` is significantly faster than `list.pop(0)`. – zwer May 25 '17 at 19:51
yea whoops, `missing` doesn't belong in the loop, good catch – gold_cy May 25 '17 at 19:54

pylang · Answer 2 · 2017-05-29T23:12:17.503

UPDATE: This is possible with more_itertools.locate, which finds indices given a condition.

import itertools as it
import collections as ct

from more_itertools import locate 


def replace_repeated(lst):
    """Return a list of unique values with a range ~ list size."""

    def excluded_indices():
        """Yield indices of all but the index of the last repeated element."""
        for i, n in repeated.items():
            yield from it.islice(locate(lst, pred=lambda x: x == i), 0, n-1)

    repeated = {k:v for k, v in ct.Counter(lst).items() if v > 1}
    missing = (i for i, _ in enumerate(lst) if i not in set(lst))
    for i in excluded_indices():
        lst[i] = next(missing)
    return lst

Steps

Find the value (k) and frequency (v) of all repeated elements in lst.
Find all missing numbers within a range proportional to the size of lst.
Find excluded_indices - indices of repeated elements (except the final occurrence of each repeated value).
For each excluded index, overwrite the element in lst with the next missing number.

Tests

import nose.tools as nt

def test_repeated(f):
    """Verify repeated terms are replaced with values."""
    nt.eq_(f([0, 1, 6, 7, 5, 4, 3, 2, 1]),              # given
             [0, 8, 6, 7, 5, 4, 3, 2, 1])
    nt.eq_(f([0, 1, 2, 7, 3, 4, 3, 2, 1]),              # given
             [0, 5, 6, 7, 8, 4, 3, 2, 1])
    nt.eq_(f([0, 1, 6, 7, 1, 4, 1, 2, 1]),              # multiples (not only duplicates)
             [0, 3, 6, 7, 5, 4, 8, 2, 1])
    nt.eq_(f([0, 1, 6, 2, 5, 3, 3, 2, 1]),              # multiples of any element
             [0, 4, 6, 7, 5, 8, 3, 2, 1])

test_repeated(f=replace_repeated)

Optionally, here is the code for locate, which can be implemented directly and obviate pip installation if desired:

def locate(iterable, pred=bool):
    return it.compress(it.count(), map(pred, iterable))

Very interesting approach... Quite inefficient and requires an additional module to be installed, tho... — zwer, May 25 '17 at 19:54
@zwer Thanks for your comment. I added the implementation for those not interested in `pip install more_itertools`, so it is not dependent on any installation. As for efficiency, I believe your example is no faster, although an interesting variation. Therefore, I'm not sure what you mean. — pylang, May 25 '17 at 20:35
Quite inefficient == will run in a non-optimal fashion. Besides, it doesn't produce the desired output - the first output is about the only one that matches the OP's requirement. — zwer, May 25 '17 at 20:38
Still a bit vague, but I improved the clear points you spotted. Thanks. — pylang, May 26 '17 at 00:14

zwer · Answer 3 · 2017-05-25T20:21:06.530

You need to do this in two stages (or let something else do it for you in two stages) - first find what values are missing, and then go through your list replacing the duplicates with the missing values. Something like:

def normalize_list(data):
    # data = list(data)  # uncomment if you don't want to modify the source list
    # use xrange() instead of range() on Python 2.x
    missing_keys = [i for i in range(len(data)-1, -1, -1) if i not in set(data)]
    for index, value in enumerate(data):
        try:
            if data.index(value, index+1):
                data[index] = missing_keys.pop()
        except ValueError:
            pass
    return data

print(normalize_list([0, 1, 6, 7, 5, 4, 3, 2, 1]))
# prints: [0, 8, 6, 7, 5, 4, 3, 2, 1]
print(normalize_list([0, 1, 2, 7, 3, 4, 3, 2, 1]))
# prints: [0, 5, 6, 7, 8, 4, 3, 2, 1]

This should work on a list of any size.

UPDATE

Considering the slowness of list.index(), here's a version without it - it looks counter-intuitive (one more loop) but it's nearly 4 times faster (and the longer the list is, the faster it is relative to the first example) with a slightly more memory use:

def normalize_list(data):
    # data = list(data)  # uncomment if you don't want to modify the source list
    key_lookup = dict.fromkeys(data, -1)
    # use xrange() instead of range() on Python 2.x
    missing_keys = [i for i in range(len(data)-1, -1, -1) if i not in key_lookup]
    for i in data:
        key_lookup[i] += 1
    for index, value in enumerate(data):
        try:
            if key_lookup[value]:
                data[index] = missing_keys.pop()
                key_lookup[value] -= 1
        except ValueError:
            pass
    return data

hilberts_drinking_problem · Answer 4 · 2017-05-25T20:37:12.780

Here is another approach:

Compute a dictionary storing the last index of occurrence for each unique value in the list.
Create a generator of missing values.
Use a list comprehension to replace all but the last value for each unique element of your list.

e.g.

xs = [0, 1, 6, 7, 5, 4, 3, 2, 1]
fixed  = {x: i for i, x in enumerate(xs)}
spares = iter(x for x in range(len(xs)) if x not in fixed)
res    = [x if i == fixed[x] else next(spares)
            for i, x in enumerate(xs)]

score 0 · Answer 5 · answered May 25 '17 at 21:31

def returnModifiedList(x):

 #find indices where there is a repeat
 storage=[]; indices = []
 for i in range(len(x)-1,-1,-1):
     if x[i] in storage:
         indices.append(i)
     else:
         storage.append(x[i])

 #find values that are missing
 s = sorted(list(set(x)))
 missing = list(reversed([i for i in range(len(x)) if (i not in s)]))

 #fill in repeats with missing values
 for i in range(len(indices)):
     x[indices[i]] = missing[i]

 return x



x = [0, 1, 2, 7, 3, 4, 3, 2, 1]
results = returnModifiedList(x)
print results

Finding duplicate numbers in a list and making them unique

5 Answers5