0

I'm searching through a text file for a certain string then looking to find another string following that string, it could be on the next line or further down the document. I currently have

so an example text output would like

there is a word1. then there is some more text. 
then we are looking for word2 = apple. 

i'm looking to return the word 'apple' + word1. However word2= can be on the next line or further down the document. i've managed to do the below but this only works if its on the next line. not if it was on line 3,4, 5 etc. can anyone help?

if 'word1' in line and 'word2' not in line:        
    nextLine = next(f)
    pattern = re.match('(?:word2=|word2 =)([a-z0-9_])+',nextLine) 
    if pattern:    
        print('word1', pattern)
lr53
  • 67
  • 8
  • It looks as if `with open(filepath, 'r') as f: print(re.findall(r'word2 ?=(\w+)', f.read()))` it will be simpler. Are you looking for a single or multiple matches? – Wiktor Stribiżew Mar 01 '21 at 11:42
  • apologies i should add I need to combine word 1 and pattern together. – lr53 Mar 01 '21 at 11:44
  • If you need a single match, you might want to try `with open(filepath, 'r') as f: m=re.search(r'word1.*(?:\n.*)*?word2 ?=(\w+)', f.read())` and then `if m: print(m.group(1))` – Wiktor Stribiżew Mar 01 '21 at 11:45
  • need a while loop that scans nextline untill if finds word2 – pippo1980 Mar 01 '21 at 11:53

3 Answers3

0

If I get it right, I made an example for you:

string = """

there is a word1. then there is some more text. 
then we are looking for word2 = apple. 


there is a word1. then there is some more text. 
then we are looking for word2 = orange. 



there is a word1. then there is some more text. 
then there is some more text. 
then there is some more text. 
then we are looking for word2= peer. 
"""


import re
result = re.findall(".*?(word1)[\s\S]*?word2 *=.*?([a-z0-9_]+)", string)
print(result)
# should be [('word1', 'apple'), ('word1', 'orange'), ('word1', 'peer')]

Note: As I am using the whole string to match, my example may not be suitable for big size file.

Darcy
  • 160
  • 1
  • 9
0
if 'word1' in line and 'word2' not in line: 
while True:       
    nextLine = next(f)
    pattern = re.match('(?:word2=|word2 =)([a-z0-9_])+',nextLine) 
    if pattern:    
        print('word1', pattern)
        break

Not sure it will work dont have access to PC let me know, if not working I'll delete it

beware tough:

Are all infinite loops bad?

Is while (true) with break bad programming practice?

pippo1980
  • 2,181
  • 3
  • 14
  • 30
0

You should read your complete file in one string, and then try this. This will capture word1, and whatever equates to word2 using capturing groups:

(word1)(?:.*[\n\r]?)+word2 ?= ?(\w+)

It is not clear from your question whether we should match word2 = apple or word2=apple (maybe the last time you mentioned word2= it was a typo?), so I included the ? character, which will make the spaces optional.

If you want your answer in the format apple + word1, you can do:

print(pattern.group(1) + " + " + pattern.group(2))
cochaviz
  • 126
  • 6