5

I just read a brilliant reply from Sloth at Remove lines that contain certain string question whilst searching for a way to filter out garbage lines in a txt / csv file. The gist is "take x y z words/strings/whatever from input file, then filter through each line writing only the unfiltered lines."

The code he posted was:

bad_words = ['bad', 'naughty']

with open('oldfile.txt') as oldfile, open('newfile.txt', 'w') as newfile:
for line in oldfile:
    if not any(bad_word in line for bad_word in bad_words):
        newfile.write(line)

My question is: Would someone explain the line if not any(bad_word in line for bad_word in bad_words): ?

I tried just putting in if not any(bad_word in line): but it gave me an error.

I am trying to understand why. A cursory search at python docs webpage didn't help me (I'm new to Python/programming and might not be too bright to boot :-) ).

Any references for me to read is appreciated.

Thanks!

Community
  • 1
  • 1
Arkham Angel
  • 309
  • 1
  • 5
  • 18
  • 2
    Possible duplicate of [How do Python's any and all functions work?](http://stackoverflow.com/questions/19389490/how-do-pythons-any-and-all-functions-work) – Celeo Sep 06 '16 at 19:46
  • Thanks, I'm checking that page. You must be a quick reader, you replied almost immediately after I posted. – Arkham Angel Sep 06 '16 at 19:47

1 Answers1

14

Would someone explain the line if not any(bad_word in line for bad_word in bad_words)

Sure.

bad_word in line for bad_word in bad_words is what's called a generator expression. It is very similar to a list comprehension, but more memory efficient.

if not any(bad_word in line for bad_word in bad_words):
    newfile.write(line)

is basically equivalent to:

list1 = []
for bad_word in bad_words:
    if bad_word in line:
        list1.append(True)
    else:
        list1.append(False)

if not any(list1):
    newfile.write(line)

I tried just putting in if not any(bad_word in line): but it gave me an error

Yeah, because any takes an iterable as input, and you have provided a boolean (bad_word in line evaluates to True or False, you can't iterate over it).

Try providing something you can iterate over, such as a list: if not any([True, False, True]):

Gillespie
  • 5,780
  • 3
  • 32
  • 54