1

Again apologies for been noob here: Trying below code for searching multiple strings read from keywords and search in f and printing the line. It works if I have only one keyword but not if I have more then one.

keywords = input("Please Enter keywords path as c:/example/ \n :")
keys = open((keywords), "r").readline()
with open("c:/saad/saad.txt") as f:
    for line in f:
        if (keys) in line:
            print(line)
martineau
  • 119,623
  • 25
  • 170
  • 301
Saadi381
  • 55
  • 1
  • 2
  • 9

3 Answers3

3

One of the challenges of looking for keywords is defining what you mean by keyword and how a file's contents should be parsed to find the full set of keywords. If "aa" is a keyword, should it match "aaa" or maybe ""aa()"? Can a keyword have numbers in it?

A simple solution is to say that keywords are alphabetic only and should match contiguous alphabetic strings exactly, ignoring case. Further, matches should be considered line by line, not sentence by sentence. We can use a regex to find alphabetic sequences and sets to check containment like so:

keys.txt

aa bb 

test.txt

aa is good
AA is good
bb is good
cc is not good
aaa is not good

test.py

import re

keyfile = "keys.txt"
testfile = "test.txt"

keys = set(key.lower() for key in 
    re.findall(r'\w+', open(keyfile , "r").readline()))

with open(testfile) as f:
    for line in f:
        words = set(word.lower() for word in re.findall(r'\w+', line))
        if keys & words:
            print(line, end='')

Result:

aa is good
AA is good
bb is good

Add more rules for what you mean by a match and it gets more complicated.

EDIT

Suppose you have one keyword per line and you just want a substring match (that is, "aa" matches "aaa") instead of a keyword search, you could do

keyfile = "keys.txt"
testfile = "test.txt"

keys = [key for key in (line.strip() for line in open(keyfile)) if key]

with open(testfile) as f:
    for line in f:
        for key in keys:
            if key in line:
                print(line, end='')
                break

But I'm just guessing what your criteria are.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • i have tried this but having couple of issues ; 1 . for some reason it doesnot search anything if keywords are entered one each line like aa bb cc if i put key words in one line then it only return bb 2. secondly what if i want to return 'aaa' as well if 'aa' keyword is searched – Saadi381 Jun 19 '16 at 00:15
  • then you need a different [regular expression] for the `re.findall()` function. – A-y Jun 19 '16 at 00:47
  • Input files can have many formats and its impossible to cover all of the possibilities. You could put samples in your question like I did in my answer. For one key per line, you could read the file line by line, strip out whitespace then filter empties like `keys = [key for key in (line.strip() for line in open(keyfile)) if key]`. To match `"aaa"`, you do a substring search instead of a regex. – tdelaney Jun 19 '16 at 14:15
  • thanks @tdelany finally it worked , just one more request if u can break this line , i am having a bit issue de coding it keys = [key for key in (line.strip() for line in open(keyfile)) if key] – Saadi381 Jun 20 '16 at 22:38
  • You can read and strip each line with `[key.strip() for key in open(keyfile)]`. But if the file has an empty line, one of the keys will be an empty string. So, you could check each key with `[key.strip() for key in open(keyfile) if key.strip()]` or you could add a generator `(key.strip() for key in open(keyfile))` to only do the strip once. – tdelaney Jun 21 '16 at 13:51
0
keywords = input("Please Enter keywords path as c:/example/ \n :")
keys = open((keywords), "r").readline()
keys = keys.split(',')  # separates key strings
with open("c:/saad/saad.txt") as f:
    for line in f:
        for key in keys:
            if key.strip() in line:
                print(line)

You are reading the line in as one string. You need to make a list of each comma separated string. Then test each key for each line (removing whitespace around the key)

This is assuming your keyword file is something like: aa is good, bb is good, spam, eggs

joel goldstick
  • 4,393
  • 6
  • 30
  • 46
0
#The Easiest one...
def strsearch():

  fopen = open('logfile.txt',mode='r+')

  fread = fopen.readlines()

  x = 'Product Name'

  y = 'Problem Description'

  z = 'Resolution Summary'

  for line in fread:

      #print(line)

       if x in line:

           print(line)

       if y in line:

           print(line)

       if z in line:

           print(line)

strsearch()

Harshan Gowda
  • 181
  • 2
  • 10