Loop problem while iterating through a list and removing recurring elements

Question

I want to iterate through a list, and remove the items that count more than once, so they don't get printed repeatedly by the for loop.

However, some items appearing only one time in the list seem to get affected too by this, and I can't figure out why.

Any input would be greatly appreciated.

Example Output:

listy = [2,2,1,3,4,2,1,2,3,4,5]
for i in listy:
  if listy.count(i)>1:
    print i, listy.count(i)
    while i in listy: listy.remove(i)
  else:
    print i, listy.count(i)

Outputs:

 2 4
 3 2
 1 2

thus ignoring completely 4 and 5.

Do you want only to print, or to get the list slimmed down also ? — eyquem, Aug 20 '11 at 11:46
Do you want the prints to be done in the same order than the elements have in **listy** ? — eyquem, Aug 20 '11 at 13:09

score 5 · Accepted Answer · answered Aug 20 '11 at 11:35

5

You should not modify a list while iterating over it. This one should work:

listy = [2,2,1,3,4,2,1,2,3,4,5]
found = set()
for i in listy:
    if not i in found:
        print i, listy.count(i)
        found.add(i)

The result is:

answered Aug 20 '11 at 11:35

Achim

15,415
15
80
144

your solution is much nicer than mine. :) I didn't think of the set and that you can test on the fly if the element was already in there... – Arne Aug 20 '11 at 11:36
Can I use a list in place of the set? Why exactly is a set preferred? – Louis93 Aug 20 '11 at 12:05
1

Yes, you can. But sets are faster. It's not important for such small lists, but if your data grows. – Achim Aug 20 '11 at 12:13
1

Your solution is: "maintain a set of numbers seen and print if not in the set." You may as well just build the set from the list and print the set. `for x in set(listy): print x` – hughdbrown Aug 20 '11 at 13:23
I'd just like to reiterate what hughdbrown said. The set() solution is a good one, but it should basically be a one-liner. – Vorticity Aug 20 '11 at 21:22
Thanks! And thanks to everyone else here as well. I learned a lot from all the answers, they were all really well written. – Louis93 Aug 21 '11 at 13:34

score 3 · Answer 2 · 2011-08-20T12:02:29.710

The reason for your problems is that you modify the list while you are iterating over it.

If you don't care about the order in which items appear in the output and don't care about the count, you can simply use use a set:

>>> listy = [2,2,1,3,4,2,1,2,3,4,5]
>>> print set(listy)
set([1, 2, 3, 4, 5])

If you do care about the count, use the Counter class from the collections module in the Standard Library:

>>> import collections
>>> collections.Counter(listy)
Counter({2: 4, 1: 2, 3: 2, 4: 2, 5: 1})
>>> c = collections.Counter(listy)
>>> for item in c.iteritems():
...     print "%i has a count of %i" % item
... 
1 has a count of 2
2 has a count of 4
3 has a count of 2
4 has a count of 2
5 has a count of 1

If you do care about both the order and the count, you have to build a second list:

>>> checked = []
>>> counts = []
>>> for item in listy: 
>>>     if item not in checked: 
>>>         checked.append(item) 
>>>         counts.append(listy.count(item))
>>> print zip(checked, counts)
... [(2, 4), (1, 2), (3, 2), (4, 2), (5, 1)]

This is the least efficient solution, of course.

If you don't want to keep the counts for later, you don't need the counts list:

listy = [2,2,1,3,4,2,1,2,3,4,5]
checked = set()
for item in listy: 
    # "continue early" looks better when there is lots of code for
    # handling the other case
    if item in checked:     
        continue

    checked.add(item) 
    print item, listy.count(item)

agf · Answer 3 · 2011-08-20T12:01:19.427

Don't modify a list while iterating over it, it will mess you up every time:

listy = [2,2,1,3,4,2,1,2,3,4,5]
#        *     *     * Get hit
for i in listy:
    print i
    if listy.count(i) > 1:
        print i, listy.count(i), 'item and occurences'
        while i in listy: listy.remove(i)
    else:
        print i, listy.count(i)

First, you remove four 2s. Two are right at the beginning, so that puts you at the first 1.
Then you advance one when you get the next i from listy, putting you at the first 3.
Then you remove two 3s. The first is right there, so that puts you at the first 4.
Then you advance one again. The 2 is gone already, so this puts you at the second 1.
You then delete both 1s; this moves you forward two spaces. The 2 and 3 are gone, so this puts you at the 5.
You advance one, this moves you off the end of the list so the loop is over.

If what you want is to print each item only once, you can use the simple set method, or you could use the itertools unique_everseen recipe:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

Which extends the basic set version to allow you to specify a special way to compare items.

If you want to know which items are only in the list once:

listy2 = filter(lambda i: listy.count(i) == 1, listy)

listy2 now has all the single occurrences.

If you don't like the lambda, just do:

def getsingles(listy):
    def singles(i):
        return listy.count(i) == 1
    return singles

then:

listy2 = filter(getsingles(listy), listy)

This makes a special function that will tell you which items are in listy only once.

-1 From his example and the text it is quite clear that this is not what OP wants. He wants to loop the list and print each element exactly once. — Howard, Aug 20 '11 at 11:34
I'm sorry, I haven't gotten to learning lambda functions yet. Do you know how to do it without using lambda? I ran it and I'm not sure it's doing what I want it to do. Howard's got the idea, no need to -1, I'm sure he misunderstood the question — Louis93, Aug 20 '11 at 11:35
@agf I understand, but please change the "You want ..." part and I am happy to undo the -1. — Howard, Aug 20 '11 at 11:41
@Howard @Louis93 See my edit. It has some more options for `listy` and the `filter` method. — agf, Aug 20 '11 at 11:44
Ahh, I misunderstood. I thought he just wanted the single elements, and the rest was a side-effect of finding them. Updated my answer. — agf, Aug 20 '11 at 12:02
agf still digesting your answer, does this mean that the list removes elements from right to left? Since we always wind up at the first occurence of the element that we've removed? — Louis93, Aug 20 '11 at 12:11
No, it goes left to right. You start at index 0. then you remove indexes 0, 1, 5, and 7 -- all the `2`s. So the new index 0 is the original index 2. Then you move to index 1, the original index 3. Then you remove indexes 1 and 4, the original indexes 3 and 8 -- all the `3`s. Then you move forward to index 2, the original index 7. Then you remove indexes 1 and 2, the original indexes 2 and 6 -- all the `1`s. At this point, the list is `[4, 4, 5]`, and you are at index 2, which is `5`. So when you move forward the last time, you're past the end of the list, and iteration stops. — agf, Aug 20 '11 at 12:30

eyquem · Answer 4 · 2011-08-20T12:34:53.240

The reason of the behavior you get is here, in the note:

http://docs.python.org/reference/compound_stmts.html#index-811

Update 1

agf's solution isn't a good one for performance reason: the list is filtered according to the count of each element. The counting is done for each element, that is to say the counting process that consists to run through the entire list to count, is done as many times as there are elements in list: it's overconsuming time, imagine if your list is 1000 length

A better solution I think is to use an instance of Counter:

import random
from collections import Counter

li = [ random.randint(0,20) for i in xrange(30)]

c = Counter(li)

print c
print type(c)

res = [ k for k in c if c[k]==1]
print res

result

Counter({8: 5, 0: 3, 4: 3, 9: 3, 2: 2, 5: 2, 11: 2, 3: 1, 6: 1, 10: 1, 12: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1})
<class 'collections.Counter'>
[3, 6, 10, 12, 15, 16, 17, 18, 19, 20]

Another solution would be to add the read elements in a set in order that the program avoids to make a count for an already seen element.

Update 2

errrr.... my solution is stupid, you don't want to select the element appearing only one time in the list....

Then the following code is the right one , I think:

import random
from collections import Counter

listy = [ random.randint(0,20) for i in xrange(30)]
print 'listy==',listy
print

c = Counter(listy)
print c
print type(c)
print

slimmed_listy = []
for el in listy:
    if el in c:
        slimmed_listy.append(el)
        print 'element',el,'  count ==',c[el]
        del c[el] 
print

print 'slimmed_listy==',slimmed_listy

result

listy== [13, 10, 1, 1, 13, 11, 18, 15, 3, 15, 12, 11, 15, 18, 11, 10, 14, 10, 20, 3, 18, 9, 11, 2, 19, 15, 5, 14, 1, 1]

Counter({1: 4, 11: 4, 15: 4, 10: 3, 18: 3, 3: 2, 13: 2, 14: 2, 2: 1, 5: 1, 9: 1, 12: 1, 19: 1, 20: 1})
<class 'collections.Counter'>

element 13   count == 2
element 10   count == 3
element 1   count == 4
element 11   count == 4
element 18   count == 3
element 15   count == 4
element 3   count == 2
element 12   count == 1
element 14   count == 2
element 20   count == 1
element 9   count == 1
element 2   count == 1
element 19   count == 1
element 5   count == 1

slimmed_listy== [13, 10, 1, 11, 18, 15, 3, 12, 14, 20, 9, 2, 19, 5]

In case you wouldn't want the result in the order of listy, the code would be even simpler

Update 3

If you want only to print, then I propose:

import random
from collections import Counter

listy = [ random.randint(0,20) for i in xrange(30)]
print 'listy==',listy
print


def gener(li):
    c = Counter(li)
    for el in li:
        if el in c:
            yield el,c[el]
            del c[el] 


print '\n'.join('element %4s   count %4s' % x for x in gener(listy))

result

listy== [16, 2, 4, 9, 15, 19, 1, 1, 3, 5, 12, 15, 12, 3, 17, 13, 8, 11, 4, 6, 15, 1, 0, 1, 3, 3, 6, 5, 0, 8]

element   16   count    1
element    2   count    1
element    4   count    2
element    9   count    1
element   15   count    3
element   19   count    1
element    1   count    4
element    3   count    4
element    5   count    2
element   12   count    2
element   17   count    1
element   13   count    1
element    8   count    2
element   11   count    1
element    6   count    2
element    0   count    2

@agf To what set method do you allude ? I don't see a method using Counter() being the same as another method that doesn't use it. I've read somewhere that Counter()'s instance is an optimized tool: running it only one time is better execution than counting the occurences of an element one element after another (if it is the method with set to which you allude) — eyquem, Aug 20 '11 at 13:34
@agf Thank you. So it's in the Achim's answer. I asked because there is also the use of a set in your unique_everseen recipe, in hop's answer, and in hughdbrown's answer; and the same method with a list instead of a set in other answers too. But all these methods are obliged to count separately from the usage of set, while in my code it's the same Counter's instance that counts and has its elements progressively deleted one after the other: there is no need of two different objects. That's why it seems that my algorithm (that I don't judge the better) isn't quite the same as the set based alg. — eyquem, Aug 20 '11 at 17:38
@agf I have kind of a doubt: you pretend that you originally proposed a code only printing, that is to say precisely what Howard reproached that you didn't ? See his first comment after your answer : _"-1 From his example and the text it is quite clear that this is not what OP wants. He wants to loop the list and print each element exactly once."_ And why disappeared your comment between this first Howard's comment and his second one : _"@agf I understand, but please change the "You want ..." part and I am happy to undo the -1."_ ? I recall to have read a comment from you between them, though — eyquem, Aug 20 '11 at 21:16
@agf English isn't my mother language, and sometimes I do confusion of meanings. "to pretend" is a confusioning word for a French. "prétendre" , in french, means "you say that, but it is uneasy for me to believe it", without being affirmative. I didn't employ "to pretend" in the sense of "feign, make believe", because I'm not sure, but in the sense "claim". However I have a strange feeling about your words. — eyquem, Aug 20 '11 at 22:32
@agf The fact that you often delete your comments, as you recognize, doesn't help to clear up uncertain comprehension, for a reason that appear dubious for me. Do you mean that my comments and yours are of "discussion" category ? By the way, when a comment disappears from a thread, it also disappears from the history. — eyquem, Aug 20 '11 at 22:33
@agf I agree with the fact that Counter is uneeded if one doesn't need counts. But why do you think it applies to the question ? He wrote: _"I want (...) remove the items that count more than once, so they don't get printed" **repeatedly** "by the for loop."_ What would bother him is to have several same repeated printings concerning the same element. That doesn't implies that he doesn't want one printing for each element. His code and the ouput he gave are clear on this point. Or will you claim that the post must be kneaded to extract a different meaning from it than the first obvious one ? — eyquem, Aug 20 '11 at 22:33
@agf You also claim that I mentioned that your original thought was that he didn't _"need the counts, just the entries once each"_ but it's another distorsion of your own. I just took what YOU claimed, that is to say that your _"reading of the original post was that he just wanted to PRINT each one once"_ . To PRINT. The discussion was about print vs print+modification of the list , not about entries vs entries+counts, and , on the contrary, I expressed that I am far to be convinced by this affirmation. I find that you twist things a lot. — eyquem, Aug 20 '11 at 22:34
I'm going to delete my comments from your post now, as nothing useful (as relating to the original question or your answer) has come from this discussion. Comments are meant to relate to the topic of the question, and our comments don't, they are just directed at each other. — agf, Aug 20 '11 at 22:35

hughdbrown · Answer 5 · 2011-08-20T13:12:08.360

Modifying a list while you iterate over it is a bad idea in every language I have encountered. My suggestion: don't do that. Here are some better ideas.

Use a set to find single occurrences

source = [2,2,1,3,4,2,1,2,3,4,5]
for s in set(source):
    print s

And you get this:

>>> source = [2,2,1,3,4,2,1,2,3,4,5]
>>> for s in set(source):
...     print s
... 
1
2
3
4
5

If you want the counts, use defaultdict

from collections import defaultdict
d = defaultdict(int)
source = [2,2,1,3,4,2,1,2,3,4,5]
for s in source:
    d[s] += 1

for k, v in d.iteritems():
    print k, v

You'll get this:

>>> for k, v in d.iteritems():
...     print k, v
... 
1 2
2 4
3 2
4 2
5 1

If you want your results sorted, use sort and operator

import operator
for k, v in sorted(d.iteritems(), key=operator.itemgetter(1)):
    print k, v

You'll get this:

>>> import operator
>>> for k, v in sorted(d.iteritems(), key=operator.itemgetter(1)):
...     print k, v
... 
5 1
1 2
3 2
4 2
2 4

score 0 · Answer 6 · answered Aug 20 '11 at 11:35

0

I am not sure if it is a good idea to iterate the list and remove elements at the same time. If you really just want to output all items and their number of occurrences, I would do it like this:

listy = [2,2,1,3,4,2,1,2,3,4,5]
listx = []
listc = []
for i in listy:
    if not i in listx:
        listx += [i]
        listc += [listy.count(i)]
for x, c in zip(listx, listc):
    print x, c

answered Aug 20 '11 at 11:35

Arne

2,624
3
24
45

If you're going to do it this way, you're better off using a `set`. – agf Aug 20 '11 at 11:43
I know. That's why i +1'ed the other answer. – Arne Aug 20 '11 at 11:47
Agree with @hop, just use `append` if that's what you're really doing. – agf Aug 20 '11 at 12:35

Pit · Answer 7 · 2011-08-20T11:52:37.547

0

Like agf said, modifying a list while you iterate it will cause problems. You could solve your code by using while and pop:

single_occurrences = []
while listy:
    i = listy.pop(0)
    count = listy.count(i)+1
    if count > 1:
        print i, count
        while i in listy: listy.remove(i)
    else:
        print i, count
    single_occurrences.append(i)

Output:

edited Aug 20 '11 at 11:52

answered Aug 20 '11 at 11:38

Pit

3,606
1
25
34

This doesn't leave you with a list of the single occurrences when you're done, though. I don't know if that matters. – agf Aug 20 '11 at 11:42
No, this completely clears the list, like Louis93's original code did. I also have no idea if he wants the single occurrences to be saved or not. – Pit Aug 20 '11 at 11:49
? His original code only removes items from the list of they have a count > 1. – agf Aug 20 '11 at 11:50
Oh, you are right! My bad, correcting my code right now! – Pit Aug 20 '11 at 11:51

score 0 · Answer 8 · answered Aug 20 '11 at 14:34

One way to do that would be to create a result list and test whether the tested value is in it :

res=[]
listy = [2,2,1,3,4,2,1,2,3,4,5]

for i in listy:
    if listy.count(i)>1 and i not in res:
        res.append(i)

for i in res:
    print i, listy.count(i)

Result :

Loop problem while iterating through a list and removing recurring elements

8 Answers8

Update 1

Update 2

Update 3

Linked

Related