0

I have a log file of a conversation. I want to search the file for certain keywords which I have assigned but the log file may contain uppercase, lowercase and title case sensitive words of the keyword I am searching.

I can pull outlines which have the keyword in lower case but can't get the uppercase or title case versions of the word. How can I solve this?

I have tried using

if (words.title() and words.lower()) in line:
     print (searchInLines[i])

but that doesn't seem to work.

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if (words.title() and words.lower()) in line:
            print (searchInLines[i])

For example, the log file contains the following sentence:

"Manchester United played Barcelona yesterday, however, the manchester club lost"

I have "manchester" in my keywords so it will pick up the second one but not the first one.

How can I recognise both?

Thanks in Advance!

Jlingz14
  • 47
  • 6
  • Possible duplicate of [How to make string check case insensitive in Python 3.2?](https://stackoverflow.com/questions/5889944/how-to-make-string-check-case-insensitive-in-python-3-2) – Georgy Apr 11 '19 at 14:29

5 Answers5

3

I was not entirely sure what you were trying to do, but I assume it is filtering out messages (lines) that contains one of the words in keywords. Here is a simple way of doing it:

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for line in searchInLines:
    for keyword in keywords:
        if keyword in line.lower():
            print(line)
Franix
  • 103
  • 8
2

Using Regex

Ex:

import re

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

#pattern = re.compile("(" + "|".join(keywords) + ")", flags=re.IGNORECASE)
pattern = re.compile("(" + "|".join(r"\b{}\b".format(i) for i in keywords) + ")", flags=re.IGNORECASE)
for line in searchInLines:
    if pattern.search(line):
        print(line)
Rakesh
  • 81,458
  • 17
  • 76
  • 113
1

First of all, you dont need f.close() when you working with context manager.

As for solution, i recommend you to use regexp in that case

import re
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
# Compiling regext pattern from keyword list
pattern = re.compile('|'.join(keywords))

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

for line in searchInLines:
    # if we get a match
    if re.search(pattern, line.lower()):
        print(line)
0

You can convert both the line and the keywords to upper or to lower case and compare them.

keywords = ['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("test.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if words.upper() in line.upper():
            print(searchInLines[i])
DobromirM
  • 2,017
  • 2
  • 20
  • 29
0

(1) Well, your words are in lower case, so "words.lower()" has no effect. (2) your example sentence would not be found if you wouldn't have "Manchester" AND "manchester" in it, since you are using "and" logic. (3) What you want, I believe, is: "if words in line.lower():"