0

It is necessary to implement a function in which an array containing several words (the number is unknown) will be passed as a parameter. There is a source file, output.txt, it contains a large number of lines, it is necessary for the function to delete all the lines entirely if the line does not contain at least one word from the array of words

This is what I have at the moment, but this function only removes one word at a time, but it needs several

import re
with open('output.txt') as f:
    lines = f.readlines()
str = "bmw" # Keyword
pattern = re.compile(re.escape(str))
with open('output.txt', 'w') as f:
    for line in lines:
        result = pattern.search(line)
        if result is not None:
            f.write(line)

SnoopFrog
  • 684
  • 1
  • 10
Cufee
  • 1
  • 2
  • If you want to match multiple words, use a pattern like `word1|word2|word3` – Barmar Dec 27 '22 at 06:33
  • Okay, but how to cram all this into a loop that will go through an array of words? – Cufee Dec 27 '22 at 06:33
  • See the linked question. – Barmar Dec 27 '22 at 06:35
  • You don't need the `result` variable. Just `if pattern.search(line):` – Barmar Dec 27 '22 at 06:36
  • @Barmar, In a similar question, not exactly what I would like. There is a search for one word in an array of words, but for me it’s the other way around, you need to search for words in a text file in which there are a lot of words (database). I need to delete all lines that do not contain at least one of the words in the array – Cufee Dec 27 '22 at 06:39
  • The other question is exactly on point. It shows how to match a set of words against a string. You do that when you create `pattern`. Then the rest of your code should work. – Barmar Dec 27 '22 at 06:40
  • `pattern = '|'.join(re.escape(word) for word in list_of_words)` – Barmar Dec 27 '22 at 06:41
  • BTW, don't use `str` as a variable name. It's the name of a built-in type. – Barmar Dec 27 '22 at 06:43
  • @Barmar Sorry, but could you provide a fully working code for this function, I do not understand anything in this case :( – Cufee Dec 27 '22 at 06:47
  • Sorry, the question is already closed, answers can't be added. – Barmar Dec 27 '22 at 06:49
  • Just replace the line `pattern = re.compile(re.escape(str))` with that. `list_of_words` is the list of strings that you want to match instead of the single string `bmw`, e.g. `list_of_strings = ["bmw", "audi", "honda"]` – Barmar Dec 27 '22 at 06:49
  • Here's what I got - https://i.imgur.com/wFJxlO2, but this code completely deletes all the lines for some reason, and even gives an error – Cufee Dec 27 '22 at 06:57
  • Sorry, I forgot `re.compile()`. `pattern = re.compile ('|'.join(re.escape(word) for word in list_of_words))` – Barmar Dec 27 '22 at 06:58
  • https://imgur.com/a/GxvKAnt – Cufee Dec 27 '22 at 07:04
  • This method removes all - https://imgur.com/a/FULRBdr – Cufee Dec 27 '22 at 07:07
  • Why do you call `re.compile` twice? – Barmar Dec 27 '22 at 07:07
  • The statement should just be `pattern = re.compile ('|'.join(re.escape(word) for word in list_of_words))` – Barmar Dec 27 '22 at 07:08

0 Answers0