0
text = """ Pratap
pandey
age
25
student
"""
keyword = "age"

re_compile = re.compile('((.*\n+){2})keyword((.*\n+){2})')
re_result = re.findall(re_compile, text)

I want to write a regex for extracting two lines before keyword and two lines after keyword when keyword is matched, with variable.

Sven-Eric Krüger
  • 1,277
  • 12
  • 19
Pratap
  • 33
  • 7
  • Would you please add an example? – ndrwnaguib Jun 28 '18 at 09:34
  • If i try with re_compile = re.compile('((.*\n+){2})age((.*\n+){2})'), then works, age instead of keyword, but i want to extract with variable name – Pratap Jun 28 '18 at 09:35
  • I mean, an example of input and expected output. – ndrwnaguib Jun 28 '18 at 09:36
  • 1
    I don't think this is possible, once compiled, the regex can't be modified. Instead you can use regex without compiling them before, modifying the string each time you change the keyword. This topic could help you https://stackoverflow.com/questions/6930982/how-to-use-a-variable-inside-a-regular-expression – Thibault D. Jun 28 '18 at 09:45
  • @andrew input: Pratap pandey age 25 student, and if it founds keyword age then print line – Pratap Jun 28 '18 at 09:53
  • I think your answer could be found in here: [SO Question](https://stackoverflow.com/questions/6930982/how-to-use-a-variable-inside-a-regular-expression) – ndrwnaguib Jun 28 '18 at 09:55
  • @Pratap Was any of the answer helpful to you? Feel free to [upvote&accept](http://stackoverflow.com/tour). – wp78de Jul 01 '18 at 18:47

3 Answers3

0

I'm not completely sure what you are asking. I think what you are trying to ask is how you put in the value of a variable named "keyword"

This is how you would do that

re.compile(f"(((.*\n+){{2}})\\s*{keyword}\\s*\n((.*\n+){{2}}))")

If you define keyword = <some value>, then the code above will work.

Btw. you need to use group 1 when extracting to get what you're looking for.

C.Holloway
  • 11
  • 4
0

Possible Solution in Python 2.7

You can use regular expressions uncompiled and put some string formatting in it.

from __future__ import print_function

import re

text = """ Pratap
pandey
age
25
student
"""
keywords = ("age", "else")

for key in keywords :
    print(re.findall(r'(.*\n+)(.*\n+){}\n+(.*\n+)(.*\n+)'.format(key), text))

Output:

[(' Pratap\n', 'pandey\n', '25\n', 'student\n')]
[]

(*) Edited regular expression.

Sven-Eric Krüger
  • 1,277
  • 12
  • 19
  • thanks but didn't work. mine main problem is to extract two lines before keyword and two lines after keyword when keyword is match – Pratap Jun 29 '18 at 04:31
  • @Pratap Then your regular expression may be incorrect... This will be the output of the whole: Two _identical_ lines before a keyword followed by _another two_ identical lines, e.g. `"abcdef\nabcdef\nKEY\nghijk\nghijk\n"`... if and only if `keyword = "KEY\n"` – Sven-Eric Krüger Jun 29 '18 at 06:16
  • @Pratap Have a look at my changes. – Sven-Eric Krüger Jun 29 '18 at 07:56
0

To match two lines before and after the keyword use a regex like this:

(?:.*(?:\r?\n)+){2}age(?:.*(?:\r?\n|$)+){3}

Demo

Explanation:

  • (?:.*(?:\r?\n|$)+){3} actually, you need to match 3 of those blocks since the first newline is found directly after the keyword (age) and the next is found the end of line 4 (25). Therefore, a third repetition is needed.

However, since this could be the end of the string, I've added $ as an alternative. I've also added an optional \r before \n which comes handy if your strings may contain Windows line endings, otherwise remove them.

Sample code:

import re
regex = r"(?:.*(?:\r?\n)+){2}age(?:.*(?:\r?\n|$)+){3}"
test_str = (" Pratap\n"
    "pandey\n"
    "age\n"
    "25\n"
    "student")

matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
    matchNum = matchNum + 1    
    print (match.group())
wp78de
  • 18,207
  • 7
  • 43
  • 71