Trying to search case insensitive keywords from a log text (.txt) file

Question

I have a log file of a conversation. I want to search the file for certain keywords which I have assigned but the log file may contain uppercase, lowercase and title case sensitive words of the keyword I am searching.

I can pull outlines which have the keyword in lower case but can't get the uppercase or title case versions of the word. How can I solve this?

I have tried using

if (words.title() and words.lower()) in line:
     print (searchInLines[i])

but that doesn't seem to work.

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if (words.title() and words.lower()) in line:
            print (searchInLines[i])

For example, the log file contains the following sentence:

"Manchester United played Barcelona yesterday, however, the manchester club lost"

I have "manchester" in my keywords so it will pick up the second one but not the first one.

How can I recognise both?

Thanks in Advance!

Possible duplicate of [How to make string check case insensitive in Python 3.2?](https://stackoverflow.com/questions/5889944/how-to-make-string-check-case-insensitive-in-python-3-2) — Georgy, Apr 11 '19 at 14:29

Franix · Accepted Answer · 2019-04-11T11:34:11.090

I was not entirely sure what you were trying to do, but I assume it is filtering out messages (lines) that contains one of the words in keywords. Here is a simple way of doing it:

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for line in searchInLines:
    for keyword in keywords:
        if keyword in line.lower():
            print(line)

Rakesh · Answer 2 · 2019-04-11T11:13:51.670

2

Using Regex

Ex:

import re

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

#pattern = re.compile("(" + "|".join(keywords) + ")", flags=re.IGNORECASE)
pattern = re.compile("(" + "|".join(r"\b{}\b".format(i) for i in keywords) + ")", flags=re.IGNORECASE)
for line in searchInLines:
    if pattern.search(line):
        print(line)

edited Apr 11 '19 at 11:13

answered Apr 11 '19 at 11:11

Rakesh

81,458
17
76
113

1

Not sure why this got downvoted +1. You might want to put word boundaries around the alternation. – Tim Biegeleisen Apr 11 '19 at 11:11

score 1 · Answer 3 · answered Apr 11 '19 at 11:28

First of all, you dont need f.close() when you working with context manager.

As for solution, i recommend you to use regexp in that case

import re
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
# Compiling regext pattern from keyword list
pattern = re.compile('|'.join(keywords))

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

for line in searchInLines:
    # if we get a match
    if re.search(pattern, line.lower()):
        print(line)

score 0 · Answer 4 · answered Apr 11 '19 at 11:14

You can convert both the line and the keywords to upper or to lower case and compare them.

keywords = ['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("test.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if words.upper() in line.upper():
            print(searchInLines[i])

score 0 · Answer 5 · answered Apr 11 '19 at 11:27

(1) Well, your words are in lower case, so "words.lower()" has no effect. (2) your example sentence would not be found if you wouldn't have "Manchester" AND "manchester" in it, since you are using "and" logic. (3) What you want, I believe, is: "if words in line.lower():"

Trying to search case insensitive keywords from a log text (.txt) file

5 Answers5