0

I am learning Python and am struggling with fining an exact word in each string in a list of strings. Apologies if this is an already asked question for this situation.

This is what my code looks like so far:

with open('text.txt') as f:
  lines = f.readlines()
  lines = [line.rstrip('\n') for line in open('text.txt')]


keyword = input("Enter a keyword: ")

matching = [x for x in lines if keyword.lower() in x.lower()]

match_count = len(matching)

print('\nNumber of matches: ', match_count, '\n')
print(*matching, sep='\n')

Right now, matching will return all strings containing the word, not strings contating the exact word. For example, if I enter in 'local' as the keyword, strings with 'locally' and 'localized' in addition to 'local' will be returned when I only want just instances of 'local' returned.

I have tried:

match_test = re.compile(r"\b" + keyword+ r"\b")

match_test = ('\b' + keyword + '\b')

match_test = re.compile('?:^|\s|$){0}'.format(keyword))


matching = [x for x in lines if keyword.lower() == x.lower()]

matching = [x for x in lines if keyword.lower() == x.lower().strip()]

And none of them shave worked, so I'm a bit stuck. How do I take the keyword entered from the user, and then return all strings in a list that contain that exact keyword?

Thanks

ksm
  • 95
  • 7
  • what do you think `keyword.lower() in x.lower()` does? – njzk2 Oct 06 '19 at 05:39
  • "all strings containing the word, not strings contating the exact word." It's unclear what this is supposed to mean... but when you give the example, it's clear that "containing" has nothing to do with what you want; you're looking for strings that are **equal to** the word. And once you realize this - once you put it in precise enough language - the solution is obvious. – Karl Knechtel Oct 06 '19 at 05:45
  • @njzk2 Sorry I missed adding that I've tried the == operator, but I used that rather than the 'in', but didn't have any results returned. Edited to show how I used the == operator, not sure if I'm using it incorrectly? – ksm Oct 06 '19 at 05:50
  • having an example of the content of `text.txt` and the input would also help understanding what the issue is – njzk2 Oct 06 '19 at 06:41

5 Answers5

4

in means contained in, 'abc' in 'abcd' is True. For exact match use ==

matching = [x for x in lines if keyword.lower() == x.lower()]

You might need to remove spaces\new lines as well

matching = [x for x in lines if keyword.lower().strip() == x.lower().strip()]

Edit:

To find a line containing the keyword you can use loops

matches = []
for line in lines:
    for string in line.split(' '):
        if string.lower().strip() == keyword.lower().strip():
            matches.append(line)
Guy
  • 46,488
  • 10
  • 44
  • 88
  • Sorry, I forgot to include that in the list of things I've tried (will edit), but using the == operator didn't return any results. Not sure if I'm utilizing it incorrectly? – ksm Oct 06 '19 at 05:47
  • @ksm Try with `strip()` to remove blanks `keyword.lower() == x.lower().strip()`. – Guy Oct 06 '19 at 05:51
  • `matching = [x for x in lines if keyword.lower() == x.lower().strip()]` Tried it out, but still not coming up with any results. – ksm Oct 06 '19 at 05:55
  • @ksm try on both words, keyword as well. – Guy Oct 06 '19 at 06:00
  • `lines` is a list of strings so `x` is a string of a single line. This will work if the line is exactly the one word, but the OP wants lines containing the exact word – jmullercuber Oct 06 '19 at 06:12
1

This method avoids having to read the whole file into memory. It also deals with cases like "LocaL" or "LOCAL" assuming you want to capture all such variants. There is a bit of performance overhead on making the temp string each time the line is read, however:

import re 

reader(filename, target):
     #this regexp matches a word at the front, end or in the middle of a line stripped 
     #of all punctuation and other non-alpha, non-whitespace characters:
     regexp = re.compile(r'(^| )' + target.lower() + r'($| )')
     with open(filename) as fin:
         matching = []
         #read lines one at at time:
         for line in fin:
             line = line.rstrip('\n')
             #generates a line of lowercase and whitespace to test against
             temp = ''.join([x.lower() for x in line if x.isalpha() or x == ' '])
             print(temp)
             if regexp.search(temp):
                 matching.append(line) #store unaltered line
         return matching

Given the following tests:

locally local! localized

locally locale nonlocal localized

the magic word is Local.

Localized or nonlocal or LOCAL

This is returned:

['locally local! localized',
 'the magic word is Local.',
 'Localized or nonlocal or LOCAL']
Community
  • 1
  • 1
neutrino_logic
  • 1,289
  • 1
  • 6
  • 11
0

Your first test seems to be on the right track

Using input:

import re
lines = [
  'local student',
  'i live locally',
  'keyboard localization',
  'what if local was in middle',
  'end with local',
]
keyword = 'local'

Try this:

pattern = re.compile(r'.*\b{}\b'.format(keyword.lower()))
matching = [x for x in lines if pattern.match(x.lower())]
print(matching)

Output:

['local student', 'what if local was in middle', 'end with local']

pattern.match will return the first instance of the regex matching or None. Using this as your if condition will filter for strings that match the whole keyword in some place. This works because \b matches the begining/ending of words. The .* works to capture any characters that may occur at the start of the line before your keyword shows up.

For more info about using Python's re, checkout the docs here: https://docs.python.org/3.8/library/re.html

jmullercuber
  • 178
  • 1
  • 9
  • This is really close! This is is returning results which is awesome, but it is only returning results in which the key word is the first word in the sentence. – ksm Oct 06 '19 at 14:36
  • You're right, I didn't consider that case! Adding a `.*` at the beginning of the expression may fix it. Seems you already got your answer but I'll update my response for completeness – jmullercuber Nov 03 '19 at 04:10
0

Please find my solution which should match only local among following mentioned text in text file . I used search regular expression to find the instance which has only 'local' in string and other strings containing local will not be searched for .

Variables which were provided in text file :

local
localized
locally
local
local diwakar
       local
   local@#!

Code to find only instances of 'local' in text file :

import os
import sys
import time
import re

with open('C:/path_to_file.txt') as f:
    for line in f:
        a = re.search(r'local\W$', line) 
        if a:
            print(line)

Output

local

local

       local

Let me know if this is what you were looking for

Diwakar SHARMA
  • 573
  • 7
  • 24
-1

You can try

pattern = re.compile(r"\b{}\b".format(keyword))
match_test = pattern.search(line)

like shown in Python - Concat two raw strings with an user name

abhilb
  • 5,639
  • 2
  • 20
  • 26
  • Why is this downvoted? I think using `pattern = re.compile(r"\b{}\b".format(keyword.lower()))` with `matching = [x for x in lines if pattern.match(x.lower())]` should work – jmullercuber Oct 06 '19 at 06:07