1

This is what I do:

for word in doc:
    if len(word) < 3:
        doc.remove(word)

But, if I do this:

for word in doc:
    if len(word) < 3:
        print word

The results returned are: 'O,' 'Of' '30' '4.' 'I.' 'IF' and more.

Most two character items are removed, but a few still remain, am I doing something wrong?

yamen
  • 15,390
  • 3
  • 42
  • 52
Carlll
  • 29
  • 1
  • 2
  • 4
  • What language is this? What is 'doc'? – yamen Apr 14 '12 at 03:24
  • Sorry, this is in python. 'doc' is practically just a list of random words and numbers – Carlll Apr 14 '12 at 03:27
  • Welcome to StackOverflow. None of your tags make any sense. Please edit your question and use meaningful tags, such as the language you're using to start with (and a definition of `doc` and a sample of it's content would help). Posting several words that mean absolutely nothing to anyone but you isn't helpful to you (or anyone else). Thanks. :) – Ken White Apr 14 '12 at 03:28
  • I've changed the tag to `python`. Still need more info - it seems likely that some of these words simply have spaces appended to them. – yamen Apr 14 '12 at 03:28
  • 1
    Please show us your entire list of input words and the entire list of output words that should have been removed. If the list is long, try paring it down to a shorter list that still demonstrates the problem. – octern Apr 14 '12 at 03:31

6 Answers6

5

The problem is the for loop of python.

For example: if you do like this:

arr = range(1, 10)
for x in arr:
    print x
    arr.remove(x)

Then you will see that not all item in arr was removed.

In your case, we can do like this:

newDoc = [ word for word in doc if len(word) >= 3 ]

Welcome to python.

Bang Dao
  • 5,091
  • 1
  • 24
  • 33
  • I'm glad to see someone try to explain why the OP's method went wrong, rather than just saying that list comprehensions are cooler and better (: – octern Apr 14 '12 at 03:35
  • @octern Unfortunately, this _doesn't_ explain why, though it does show an example of it happening. – agf Apr 14 '12 at 03:58
  • I didn't find any document about the for loop in python, but from the output I guess the problem is: if `for x in arr`, `x` at index `i`, and you `arr.remove(x)`, the next `x` is not `arr[i+1]` but `arr[i+2]` -> this mean python skip one member of `arr`. All above is just what I guess. – Bang Dao Apr 14 '12 at 04:07
  • I think that's right. I've seen many examples of unintuitive things going wrong when you alter a sequence at the same time as you're iterating over it. If you don't want to use list comprehensions, one solution could be to make a copy of the list and iterate over that, while performing the remove operations on the original list. This will work for the remove operation, which removes elements by their value, but would stil go crazy if you were using a method that referred to values by numeric indices. – octern Apr 14 '12 at 18:47
3

In order to accurately answer your question, we need to see what the contents of doc are. Preferably in the format it is displayed in the interactive Python interpreter.

That being said, the ideal (read pythonic) way to remove items from a list would be to A) use filter:

filter(lambda x: len(x) > 2, doc)

or B) use a list comprehension:

[word for word in doc if len(word) > 2]
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
2

You should invert logic and use list comprehension:

[ word for word in doc if len(word) >= 3 ]
yazu
  • 4,462
  • 1
  • 20
  • 14
1

I suggest using list comprehensions

doc = [w.strip() for w in doc if len(w.strip()) >= 3]

The strip() will remove whitespace.

ChrisP
  • 5,812
  • 1
  • 33
  • 36
0

Create a copy of the list you want to delete elements from instead.

for word in doc[:]:
    if len(word) < 3:
        doc.remove(word)

In general, it's not good practice to overwrite data you're iterating over. You run into issues, like the one you just did here.

Makoto
  • 104,088
  • 27
  • 192
  • 230
0

When you remove an element, the list shifts and the loop will continue with the +1 indexed element. Then it skips one word.

To test it write these in the interpreter:

l = range(5)
for i in l:
    l.remove(i)
    print i, l

Results:

0 [1, 2, 3, 4]  
2 [1, 3, 4]  
4 [1, 3]