-4

I am currently filtering out all non-alphanumeric characters from this list.

cleanlist = []
    for s in dirtylist:
        s = re.sub("[^A-Za-z0-9]", "", str(s)) 
        cleanlist.append(s)

What would be the most efficient way to also filter out whitespaces from this list?

Benihana
  • 121
  • 2
  • 9
  • 2
    That regex already does filter out whitespace (after all, whitespace *is* non-alphanumeric) - please show a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) of your problem. – Tim Pietzcker Jan 12 '16 at 18:04
  • @TimPietzcker I think you're being a little too persnickety. it is fairly obvious what the asker is having problems with and what he wants to achieve. – Ben Glasser Jan 12 '16 at 18:11
  • Should the emphasis be on _also_? Not a single character, matches all the other characters. That is what a negation is. –  Jan 12 '16 at 18:11
  • It's not at all obvious to me. "Whitespace" is something else entirely than "empty strings". – Tim Pietzcker Jan 12 '16 at 18:12

3 Answers3

0

this will strip whitespace from strings and wont add empty strings to your cleanlist

cleanlist = []
    for s in dirtylist:
        s = re.sub("[^A-Za-z0-9]", "", str(s).strip()) 
        if s:
            cleanlist.append(s)
Ben Glasser
  • 3,216
  • 3
  • 24
  • 41
  • The regex already strips out all the whitespace. The question is unanswerable in its current form. Good idea about removing empty strings from the result list (except for the syntax and indentation errors), though. – Tim Pietzcker Jan 12 '16 at 18:10
0

I'd actually go and use list comprehension for this, but your code is already efficient.

pattern = re.compile("[^A-Za-z0-9]")
cleanlist = [pattern.sub('', s) for s in dirtylist if str(s)]

Also, this is a duplicate: Stripping everything but alphanumeric chars from a string in Python

Community
  • 1
  • 1
tglaria
  • 5,678
  • 2
  • 13
  • 17
0

The largest efficiency comes from using the full power of regular expression processing: don't iterate through the list. Second, do not convert individual characters from string to string. Very simply:

cleanlist = re.sub("[^A-Za-z0-9]+", "", dirtylist)

Just to be sure, I tested this against a couple of list comprehension and string replacement methods; the above is the fastest by at least 20%.

Prune
  • 76,765
  • 14
  • 60
  • 81