Python 2.7 delete items form list while looping

Question

I have a list of object f, and f has a score attribute and an inter(f) (intersect) method. I want to have a list of non intersecting f objects, and in case of an intersection, I remove the one with low score.

I tried to solve this by two for loops and create a temporary tmp list for all items except the one that I want to remove, then I put tmp in the original list (lst) that I'm already working on.

for f1 in lst: 
    for f2 in lst: 
        if f1!=f2:
            if f1.intersect(f2):
                if f1.score>=f2.score:
                    tmp=[f for f in lst if f!=f2]
                    lst=[]
                    lst.extend(tmp)
                else:
                    tmp=[f for f in lst if f!=f1]
                    lst=[]
                    lst.extend(tmp)

Problem: sometimes it works but sometimes the lst at the end is null. Why does this happen and how do I fix it? It works for me if there is another way to do it, rather than what I currently have.

By "sometimes it works" do you mean that running exactly the same code in exactly the same conditions/environment with the same input will randomly succeed/fail? Things like that are normally threading issues but that doesn't apply to Python... — Basic, Feb 26 '15 at 19:36
What defines an intersection on `f` objects and can the intersecting pair be anywhere in the two lists? (As in different indices). Probably this has been set up to be hard because the wrong data structures are being used. — Two-Bit Alchemist, Feb 26 '15 at 19:36
@Basic I am sure she means "depending on the content of the two lists at hand" — Two-Bit Alchemist, Feb 26 '15 at 19:37
@Two-BitAlchemist Quite possibly but in that case, providing an example that fails and an example that succeeds would be useful, if not essential... — Basic, Feb 26 '15 at 19:38
@basic Yes as Two bit alchemist said depends on the data in the list — Mariya, Feb 26 '15 at 19:38
@Mariya in that case, can you please provide an example of it working/failing so we can reproduce? Thanks — Basic, Feb 26 '15 at 19:41
@Two-BitAlchemist f also has start and stop so I check for overlap in the interval of the 2 f. — Mariya, Feb 26 '15 at 19:41
I'm assuming `intersect()` is like saying the two objects are equal? Like if f were an object that had an `intersect()` method and an `int` object, then intersect would compare the two integers? — Ben Schwabe, Feb 26 '15 at 19:42
What should happen if an object intersects with two or more others? — Janne Karila, Feb 26 '15 at 19:43
@JanneKarila I assume that's why there is the `for` loop over the entire list for each object. — Ben Schwabe, Feb 26 '15 at 19:45
@JanneKarila at the end I want a list of non intersecting objects — Mariya, Feb 26 '15 at 19:46
Suppose score 1 intersects with score 2 intersects with score 3, and deleting 2 would leave you non-intersecting objects 1 and 3. Or, you can first delete 1 and then 2, leaving only 3. Do you accept either result? — Janne Karila, Feb 26 '15 at 19:54
I still think this is an XY problem. Your solution to this got off several steps ago like when you set up the data in the two lists in the first place and now the only way to "solve" the problem is O(n^2). FWIW I think you should ask a new question about the problem you started with and what you have tried so far (succinctly). Like, to start, where did you get the data for how you put this all together? — Two-Bit Alchemist, Feb 26 '15 at 20:09
@Two-BitAlchemist I think data are quite complicated. I just need to Know the right way to delete from list while looping — Mariya, Feb 26 '15 at 20:17
@Mariya There is no "right way to delete from a list while looping" because as others are telling you, that's a very brittle thing to do. — Two-Bit Alchemist, Feb 26 '15 at 20:19

score 2 · Accepted Answer · edited May 23 '17 at 12:28

I'm ignoring the semantics of your intersect function. It should not matter for the purposes of your question if it is a question about loops in python. If this is a question about your intersect function's semantics in this particular use case, you have not provided enough information.

In general modifying an iterable object (like a list) while looping over it is dangerous and discouraged. For example, if we write this loop

xs = [ 1 ]
for x in xs:
    xs.append(x+1)

python will actually loop infinitely. The iterator for list objects will continue to grab the newly appended elements.

You can solve this by not modifying lst until you are done iterating over it:

to_remove = []
for f1 in lst:
    # because lst is not being modified, we have to manually skip
    # elements which we will remove later
    # the performance difference is negligible on small lists
    if f1 in to_remove:
        continue
    for f2 in lst:
        # also skip f2s which we removed
        if f2 in to_remove:
            continue
        # note that I collapsed two of your conditions here for brevity
        # this is functionally the same as what you wrote, but looks neater
        if f1 != f2 and f1.intersect(f2):
            if f1.score >= f2.score:
                to_remove.append(f2)
            else:
                to_remove.append(f1)
lst = [x for x in lst if x not in to_remove]

Note that this solution is far from perfect. There are two major issues I still have with it: using a list instead of a set for to_remove, which better express your meaning, and repeating comparisons by doing a naieve nested loop.

The next step in improving this would be to replace the to_remove with a set object, and to cut down on the excessive looping. We can do this easily enough using list slicing and the handy enumerate function.

So, part 1 is switching over to sets:

to_remove = set()
for f1 in lst:
    if f1 in to_remove:
        continue
    for f2 in lst:
        if f2 in to_remove:
            continue
        if f1 != f2 and f1.intersect(f2):
            if f1.score >= f2.score:
                to_remove.add(f2)
            else:
                to_remove.add(f1)
lst = [x for x in lst if x not in to_remove]

The second component, using enumerate, relies on knowledge of the slice notation. If you are not familiar with it, I recommend reading up on it. A good SO post on it: Explain Python's slice notation

Anyway, here we go:

to_remove = set()
# with enumerate, we walk over index, element pairs
for index,f1 in enumerate(lst):
    if f1 in to_remove:
        continue
    # parens in slicing aren't required, but add clarity
    for f2 in lst[(index+1):]:
        if f2 in to_remove:
            continue
        # no need to check for f1 == f2, since that's now impossible
        # unless elements are duplicated in your list, which I assume
        # is not the case
        if f1.intersect(f2):
            if f1.score >= f2.score:
                to_remove.add(f2)
            else:
                to_remove.add(f1)
# still probably the clearest/easiest way of trimming lst
lst = [x for x in lst if x not in to_remove]

If you don't actually need lst to be a list, you can go a step further and make it a set as well. That opens up the possibility of exploiting the built-in set difference operation, but that makes the looping somewhat harder.

to_remove = set()
# still iterate over it as a list, since we need that to be able to slice it
# if you replace it with a set at the outset, you can always listify it
# by doing `list(lst_as_set)`
for index,f1 in enumerate(lst):
    if f1 in to_remove:
        continue
    # parens in slicing aren't required, but add clarity
    for f2 in lst[(index+1):]:
        if f2 in to_remove:
            continue
        # no need to check for f1 == f2, since that's now impossible
        if f1.intersect(f2):
            if f1.score >= f2.score:
                to_remove.add(f2)
            else:
                to_remove.add(f1)

# yep, we can turn the set into a list more or less trivially
# (usually, duplicate elements make things complicated)
keep = set(lst)
# set difference can be done with the minus sign:
# https://docs.python.org/2/library/stdtypes.html#set
keep = keep - to_remove

EDIT: In my initial answer, I did not remove elements from consideration after they were added to to_remove

The problem with this answer is that you do not actually remove the elements from consideration once they've been marked for removal. Suppose f1 intersects with f2 and f2 intersects with f3 but f1 does not intersect with f3, and also suppose that f2 has a lower score than f1 and f3 has a lower score than f2. So we find the intersecting pair (f1, f2) and mark f2 for removal. But we don't actually remove it, so later on we find the intersecting pair (f2, f3), and mark f3 for removal. But that only leaves f1 in the output even though f1 and f3 do not intersect. — rici, Feb 27 '15 at 02:04
@rici You are entirely correct. I've fixed it. Thanks for catching that. — sirosen, Feb 27 '15 at 21:09

Python 2.7 delete items form list while looping

1 Answers1