-1

i have got 2 files ,one with some keywords and other with plain text i.e myfile.txt , i need to open a myfile.txt and extract the specific text starting with each keyword (mentioned in keyword file) and ends with "!" example:

keyword file :
vrf-a
vrf-b

myfile.txt:

hello
how are you
!
x vrf-a
number 1
!
hi
howa are you
!
x vrf-b
number 2
!

Output should be:

x vrf-a
number 1
!
x vrf-b
number 2

I tried the below code:

import re  
crazy = open("keyword.txt","r+")  
lines  = crazy.readlines()  
for word in lines:  
    #print(word)  
    with open('mytext.txt', 'r') as fh:  
        result = re.findall(r'word[^!]+', fh.read(), re.M)  
        print(result)  
fh.close()  
crazy.close()  

output getting as : [] [] means no match

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • Hello @RomanPerekhrest, i have made efforts and this is not duplicate..its continuation of last part... i tried to iterate the keyword file but problem is that , i am not able to ge the correct result while i put the keyword in regular expression..In c++ , we prepand & to get the vlaue and accordingly get the corrosponding vlaue.. here i have run for loop to iterate th keyword file and plaintext but problem is that i am not able to use the list value in regular expression..Its using the specfic text to match pattern..not list index.. – Anurudh Dubey Apr 01 '17 at 20:23
  • @Eric import re crazy = open("keyword.txt","r+") lines = crazy.readlines() for word in lines: #print(word) with open('mytext.txt', 'r') as fh: result = re.findall(r'word[^!]+', fh.read(), re.M) print(result) fh.close() crazy.close() – Anurudh Dubey Apr 01 '17 at 20:55
  • just curious to know...if we want to use for the values of list in regular expression...how to go ahead...here i am using word (which is keyword value mentioned in keyword.txt file – Anurudh Dubey Apr 01 '17 at 20:58
  • @Eric.. Just updated the question – Anurudh Dubey Apr 01 '17 at 21:01
  • @Eric , yes , you are correct. Main reason to post this question is the same..I have quoted the same in my last comment...Just curious to know how can we use the string inside the varible word in regular expression..I tried to google too but not succed – Anurudh Dubey Apr 01 '17 at 21:12

2 Answers2

0

r'word[^!]+' is looking for the substring "word" followed by any number of characters which aren't "!". It doesn't look for the string defined in the word variable.

Here's a working code :

import re

with open('mytext.txt') as fh:      
    mytext = fh.read()

with open("keyword.txt") as crazy:
  for word in crazy:
      word = word.strip()
      results = re.findall(word+'[^!]+!', mytext, re.M)
      for result in results:
          print(result)

It outputs :

vrf-a
number 1
!
vrf-b
number 2
!
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
0

You need to use the word as a variable not a string. With a little help of link below:

How to use a variable inside a regular expression?

I have made a little change to our code and it works fine now. You just have to make sure your output is in a format you want:

import re  
crazy = open("keyword.txt","r+")  
lines  = crazy.readlines()  
for word in lines:  
    with open('mytext.txt', 'r') as fh:  
        result = re.findall(re.escape(word) + r'[^!]+', fh.read(), re.M)  
        print(''.join(result)) 
fh.close()  
crazy.close()  

Best

Community
  • 1
  • 1
ida
  • 1,011
  • 1
  • 9
  • 17