0

Getting some confusing behaviour when running a for loop and removing entries from a list (cleaning out invalid urls):

urls = ['http://a.com/?mail=a@b.com','mailto:a@a.com', 'mailto:a@b.com', 'mailto:a@c.com', 'mailto:a@d.com']

for s in urls:
    if '@' in s and '?' not in s:
        urls.remove(s)

print(urls)

The output is:

['mailto:a@b.com', 'mailto:a@d.com']

It is consistently every other entry, so I'm assuming my understanding of python is not correct.

I looked into list comprehension with Python and ended up with:

urls = [s for s in urls if not ('?' not in s and '@' in s)]

This does what I want it to.

Is that the best way, can someone explain the behaviour, because I don't get it.

Thanks

user2390182
  • 72,016
  • 6
  • 67
  • 89

2 Answers2

2

The problem with your first solution is that you iterate over an object while deleting entries from it. The topic is discussed here for example: How to remove items from a list while iterating?

Gregor
  • 588
  • 1
  • 5
  • 19
0

If you are trying to remove from list while iterating over, take a copy and iterate. urls[:] takes a copy of urls and you iterate over that. This prevents some unexpected situations that occur when iterating through the original list:

urls = ['http://a.com/?mail=a@b.com','mailto:a@a.com', 'mailto:a@b.com', 'mailto:a@c.com', 'mailto:a@d.com']

for s in urls[:]:
    if '@' in s and '?' not in s:
        urls.remove(s)

print(urls)

But, I would rather prefer the list-comprehension version of yours, that's more concise and pythonic.

Austin
  • 25,759
  • 4
  • 25
  • 48
  • I believe this is a good solution but you should explain it. – Cole Nov 04 '18 at 16:23
  • @Cole, How about now? I was in the process of writing explanation. :) – Austin Nov 04 '18 at 16:28
  • That's a cool tip, and useful when I need to do a bit more than just remove based on characters - and still having code I can actually read (regardless of how pythonic it might be) :) – Stuart Grierson Nov 04 '18 at 16:52