0

So I'm writing a program that loops through multiple .txt files and searches for any number of pre-specified keywords. I'm having some trouble finding a way to pass through the keywords list to be searched for.

The code below currently returns the following error:

TypeError: 'in <string>' requires string as left operand, not list

I'm aware that the error is caused by the keyword list but I have no idea how to input a large array of keywords without it running this error.

Current code:

from os import listdir

keywords=['Example', 'Use', 'Of', 'Keywords']
 
with open("/home/user/folder/project/result.txt", "w") as f:
    for filename in listdir("/home/user/folder/project/data"):
        with open('/home/user/folder/project/data/' + filename) as currentFile:
            text = currentFile.read()
            #Error Below
            if (keywords in text):
                f.write('Keyword found in ' + filename[:-4] + '\n')
            else:
                f.write('No keyword in ' + filename[:-4] + '\n')

The error is indicated in line 10 in the above code under the commented section. I'm unsure as to why I can't call a list to be able to search for the keywords. Any help is appreciated, thanks!

Y4RD13
  • 937
  • 1
  • 16
  • 42
  • 1
    What part of the error message do you not understand? You cannot use the `in` operator to see if a `list` is in a `str`. That is not a defined operation. Like, you want to check if any of the strings in your list is in the string. – juanpa.arrivillaga Mar 08 '21 at 02:23
  • Does this answer your question? [Check if multiple strings exist in another string](https://stackoverflow.com/questions/3389574/check-if-multiple-strings-exist-in-another-string) – DarrylG Mar 08 '21 at 03:28

3 Answers3

1

You could replace

if (keywords in text):
   ...

with

if any(keyword in text for keyword in keywords):
   ...
Alain T.
  • 40,517
  • 4
  • 31
  • 51
0

try looping through the list to see if each element is in the text

for i in range(0, len(keywords)):
    if keywords[i] in text:
        f.write('Keyword found in ' + filename[:-4] + '\n')
        break
    else:
        f.write('No keyword in ' + filename[:-4] + '\n')
        break

you cannot use in too see if a list is in a string

TheSavageTeddy
  • 204
  • 1
  • 13
0

I would use regular expressions as they are purpose-built for searching text for substrings.

You only need the re.search block. I added examples of findall and finditer to demystify them.

# lets pretend these 4 sentences in `text` are 4 different files
text = '''Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum'''.split(sep='. ')

# add more keywords
keywords=[r'publishing', r'industry']
regex = '|'.join(keywords)
import re
for t in text:
    lst = re.findall(regex, t, re.I) # re.I make case-insensitive
    for el in lst:
        print(el)

    iterator = re.finditer(regex, t, re.I)
    for el in iterator:
        print(el.span())

    if re.search(regex, t, re.I):
        print('Keyword found in `' + t + '`\n')
    else:
        print('No keyword in `' + t + '`\n')

Output:

industry
(65, 73)
Keyword found in `Lorem Ipsum is simply dummy text of the printing and typesetting industry`

industry
(25, 33)
Keyword found in `Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book`

No keyword in `It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged`

publishing
(132, 142)
Keyword found in `It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum`
Razzle Shazl
  • 1,287
  • 1
  • 8
  • 20
  • I found that regex made some conflicts in the code using it in the past but will give it another shot. Thanks for a solution! – Stephen Flynn Mar 11 '21 at 17:41
  • @StephenFlynn Hopefully using regex again is now possible for you? Give the `re.search` a gander and let me know results. – Razzle Shazl Mar 11 '21 at 18:07