Python: Iterate through list and remove duplicates (without using Set())

Question

So I have a list:

s = ['cat','dog','cat','mouse','dog']

And I want to be able to iterate through the list and remove the duplicates, WITHOUT using the set() function! So for example it should remove 'cat' and position s[2] BUT keep 'cat' at position s[0]. It then needs to do the same thing for 'dog', ie. keep 'dog' at position s[1] but remove 'dog' from position s[4].

So the output is then:

s = ['cat','dog','mouse']

I have tried to use i and j as index positions in the list, and check whether the element at position i is equal to the element at position j. If so, it will remove it and increment the value of j by 1, if not then it will leave it and just increment the value of j. After the whole list has been iterated through, it will increment the value of i and then check the whole list again, for the new element. Below:

i = 0
j = 1
for a in range(len(s)):
    for b in range(len(s)):
        if s[i] == s[j]:
            s.remove(s[j])
            j = j + 1
        else:
            j = j + 1
    i = i + 1

What am I doing wrong here?

Use in instead of ==. Store them in a sepate array if that value doesn't exist in that array. — almost a beginner, Apr 26 '17 at 01:06
Is there any reason you need to update the list in place vs. create a new list? `l = []; for e in s: if e not in l: l.append(e)` — AChampion, Apr 26 '17 at 01:07
Another problem is that you are iterating through all the indexes up to the end of the list. If there actually are any duplicates, the list will be shorter by the time you get there, and you will get an IndexError. — zondo, Apr 26 '17 at 01:09

score 7 · Accepted Answer · answered Apr 26 '17 at 01:14

7

The issue is with "automatic" for loops - you have to be careful about using them when modifying that which you are iterating through. Here's the proper solution:

def remove_dup(a):
   i = 0
   while i < len(a):
      j = i + 1
      while j < len(a):
         if a[i] == a[j]:
            del a[j]
         else:
            j += 1
      i += 1

s = ['cat','dog','cat','mouse','dog']
remove_dup(s)
print(s)

Output: ['cat', 'dog', 'mouse']

This solution is in-place, modifying the original array rather than creating a new one. It also doesn't use any extra data structures.

answered Apr 26 '17 at 01:14

Apollys supports Monica

2,938
1
23
33

Thanks mate! Was thinking about using a while loop, missed the part about the j, after the first while, and the list.remove() just wasn't working for me! Thanks! – Liam G Apr 26 '17 at 06:03
Returns None for me. `List = ['94. / Date: 16 Feb, 2022 ', '95. / Date: 16 Feb, 2022 ', '96. / Date: 16 Feb, 2022 ', '97. / Date: 16 Feb, 2022 ', '98. / Date: 16 Feb, 2022 ', '99. / Date: 16 Feb, 2022 ', '100. / Date: 16 Feb, 2022 ', '101. / Date: 16 Feb, 2022 ', '102. / Date: 18 Feb, 2022 ', '103. / Date: 18 Feb, 2022 ', '103. / Date: 18 Feb, 2022 ', '103. / Date: 18 Feb, 2022 ', '103. / Date: 18 Feb, 2022 ', '103. / Date: 18 Feb, 2022 ', '103. / Date: 18 Feb, 2022 ']` – AnonymousUser Feb 21 '22 at 05:46
Please check that you have copied the code exactly, and are using it as I have demonstrated. Yes, the function returns `None`: note that the function is a mutator function that modifies the list in-place, as described above. Additionally, I would recommend against using a variable name like `List`, as that is dangerously close to the builtin keyword `list` (and is a keyword if you use the typing library). – Apollys supports Monica Feb 22 '22 at 06:46

Robbie · Answer 2 · 2017-04-26T01:11:22.477

6

You can loop through the list and check if the animal has already been added.

s = ['cat','dog','mouse','cat','horse','bird','dog','mouse']

sNew = []
for animal in s:
    if animal not in sNew:
        sNew.append(animal)

s = sNew

edited Apr 26 '17 at 01:11

answered Apr 26 '17 at 01:08

Robbie

4,672
1
19
24

Is there an alternative to "not in"? The purpose of my task is to understand how the sorting algorithms work, so "not in" is kind of cheating I guess. – Liam G Apr 26 '17 at 01:18

MSeifert · Answer 3 · 2017-04-26T01:28:20.793

4

You shouldn't alter the list while you iterate over it, you'll likely either skip elements or get an IndexError. If you just can't use set use collections.OrderedDict:

>>> from collections import OrderedDict

>>> s = ['cat','dog','cat','mouse','dog']

>>> list(OrderedDict.fromkeys(s).keys())
['cat', 'dog', 'mouse']

edited Apr 26 '17 at 01:28

answered Apr 26 '17 at 01:10

MSeifert

145,886
38
333
352

Not OP but thank you for the answer. I'm trying to understand your code. So you can use `OrderedDict` because the list can be regarded as a dictionary with just keys but no values? What does 'fromkeys' do? Thanks. – Bowen Liu Sep 20 '18 at 13:33
@BowenLiu `fromkeys` creates a dictionary with the specified keys. I used an ordered dictionary because it's ordered and removes duplicate keys, we don't care about the values of the dictionary - but there's no builtin ordered set... – MSeifert Sep 24 '18 at 09:38
Thanks a lot for your explanation and introducing this new method. I didn't know that you can create dictionaries without value corresponding to each key. Thanks. – Bowen Liu Sep 24 '18 at 13:15
Note that in modern Python (CPython/PyPy 3.6, and any Python 3.7+) you can just use `dict.fromkeys` and it will run faster to boot (`dict` is insertion ordered, and `OrderedDict` isn't necessary unless you're relying on the order-changing methods or order-sensitive comparisons). There's also no need to call `.keys()` (on any version of Python); `dict`s are already iterables of their keys, so `list(dict.fromkeys(s))` is sufficient. – ShadowRanger Jul 02 '21 at 00:07

Remolten · Answer 4 · 2017-04-26T01:11:19.797

1

I am not sure why you wouldn't use a set, but here is an alternative. Iterate over your original list, placing each element into a new list if it is not already in the new list. Example:

l = []
s = ['dog', 'cat', 'cat', 'mouse', 'dog']

for i in range(len(s)):
    if s[i] not in l:
        l.append(s[i])

Now:

>>> s
['dog', 'cat', 'mouse']

edited Apr 26 '17 at 01:11

answered Apr 26 '17 at 01:09

Remolten

2,614
2
25
29

2

More canonical would be just to iterate of the list `s` vs. indices. – AChampion Apr 26 '17 at 01:11
Very true. It would likely be more Pythonic. – Remolten Apr 26 '17 at 01:12
Is there an alternative to "not in"? The purpose of my task is to understand how the sorting algorithms work, so "not in" is kind of cheating I guess. – Liam G Apr 26 '17 at 01:18

score 1 · Answer 5 · answered Apr 26 '17 at 02:23

1

Here's a one line solution:

s = ['dog', 'cat', 'cat', 'mouse', 'dog']   

answer = [animal for idx, animal in enumerate(s) if a not in s[:idx]]

And you'll see:

>>> answer
['cat', 'dog', 'mouse']

answered Apr 26 '17 at 02:23

Hazzles

436
2
6

score -1 · Answer 6 · edited Aug 01 '20 at 21:13

-1

s = ['cat','dog','cat','mouse','dog']
duplicates = []

for animal in s:
  if s.count(animal) > 1:
    if animal not in duplicates:
      duplicates.append(animal)
print(duplicates)

edited Aug 01 '20 at 21:13

Shishir Pandey

832
1
12
23

answered Aug 01 '20 at 15:28

Sujeet Toppo

1

This would be a better answer if you explained how the code you provided answers the question. – pppery Aug 01 '20 at 21:26
This adds nothing that the other answers didn't already cover, and is actually wrong, as it doesn't keep anything *unless* it's duplicated (it does reduce duplicates to a single copy, but eliminating the non-duplicates is wrong). For the OP's case for instance, they expect to see `'mouse'` as the final element in the result, but you exclude it because it only appears once. – ShadowRanger Jul 02 '21 at 00:11

score -2 · Answer 7 · edited Jul 01 '21 at 23:24

-2

Here is only by Type-Casting,

s = ['cat','dog','cat','mouse','dog']

l = list(set(s)) 

print(l)

edited Jul 01 '21 at 23:24

Sven Eberth

3,057
12
24
29

answered Jul 01 '21 at 20:54

Hemant Aryan

1
1

1

The OP's question specifically excluded the use of `set`; if they hadn't, this would be fine, but it was also be incredibly obvious (the restriction in the question is why noone else posted it). – ShadowRanger Jul 02 '21 at 00:08

Python: Iterate through list and remove duplicates (without using Set())

7 Answers7

Linked