-1

I have a text file with 1000+ lines, each one representing a news article about a topic that I'm researching. Several hundred lines/articles in this dataset are not about the topic, however, and I need to remove these.

I've used grep to remove many of them (grep -vwE "(wordA|wordB)" test8.txt > test9.txt), but I now need to go through the rest manually.

I have a working code that finds all lines that do not contain a certain word, prints this line to me, and asks if it should be removed or not. It works well, but I'd like to include several other words. E.g. let's say my research topic is meat eating trends. I hope to write a script that prints lines that do not contain 'chicken' or 'pork' or 'beef', so I can manually verify if the lines/articles are about the relevant topic.

I know I can do this with elif, but I wonder if there is a better and simpler way? E.g. I tried if "chicken" or "beef" not in line: but it did not work.

Here's the code I have:

orgfile = 'text9.txt'
newfile = 'test10.txt'
newFile = open(newfile, 'wb')
with open("test9.txt") as f:
    for num, line in enumerate(f, 1):
        if "chicken" not in line:
            print "{} {}".format(line.split(',')[0], num)
            testVar = raw_input("1 = delete, enter = skip.")
            testVar = testVar.replace('', '0')
            testVar = int(testVar)
            if testVar == 10:
                print ''
                os.linesep
            else:
                f = open(newfile,'ab')
                f.write(line) 
                f.close()
        else:
            f = open(newfile,'ab')
            f.write(line) 
            f.close()

Edit: I tried Pieter's answer to this question but it does not work here, presumeably because I am not working with integers.

Community
  • 1
  • 1
Isak
  • 535
  • 3
  • 6
  • 17

1 Answers1

1

you can use any or all and a generator. For example

>>> key_word={"chicken","beef"}
>>> test_texts=["the price of beef is too high", "the chicken farm now open","tomorrow there is a lunar eclipse","bla"]
>>> for title in test_texts:
    if any(key in title for key in key_words):
        print title


the price of beef is too high
the chicken farm now open
>>> 
>>> for title in test_texts:
    if not any(key in title for key in key_words):
        print title


tomorrow there is a lunar eclipse
bla
>>> 
Copperfield
  • 8,131
  • 3
  • 23
  • 29