0

I am trying to remove all elements of a list that do not match a given regular expression. I am using the following code:

import json
import re

skus = [u'12', u'344', u'56', u'PSJAI12345', u'57']
pattern = re.compile('([A-Z]{5})(\d{5})')
for sku in skus:
        if pattern.match(sku):
                print("skip")
        else:
                skus.remove(sku)

print json.dumps(skus)

The output is:

["344", "PSJAI12345"]

the expected output was:

["PSJAI12345"]

It seems like items with odd index are somehow skipping iteration (skip not getting printed when PSJAI12345 matched the regular expression). I can't understand why. Please can someone explain what's going on here.

falsetru
  • 357,413
  • 63
  • 732
  • 636
nish
  • 6,952
  • 18
  • 74
  • 128
  • 1
    You're modifying your list within the loop. Doing this is not recommended, if necessary requires careful debugging. – simonzack Sep 20 '14 at 11:06

1 Answers1

1

Don't modify the sequence/mapping while you're iterating it.

Here's an alternative using list comprehension (instead of modifying the list, returning new one):

import re
import json

skus = [u'12', u'344', u'56', u'PSJAI12345', u'57']
pattern = re.compile('([A-Z]{5})(\d{5})')
skus = [sku for sku in skus if pattern.match(sku)]  # OR skus[:] = ...
print json.dumps(skus)

output:

["PSJAI12345"]

Alternatively, iterate the copy of the original list (this is not recommended though, it is slow because remove search the element from the beginning).

for sku in skus[:]:
    if pattern.match(sku):
        print("skip")
    else:
        skus.remove(sku)
falsetru
  • 357,413
  • 63
  • 732
  • 636