extracting word before character

Question

I am trying to extract any word before Y which is boundary separated. As I am trying to consider each line as a separate record using (?m) flag and trying to capture \w+ which is look ahead by \s+Y ,but I am only able to print 1st match, not the 2nd match(IMP1).

print(foo)
this is IMP Y text
and this is also IMP1 Y text
this is not so IMP2 N text
Y is not important

Current fruitless attempt:

>>> m = re.search('(?m).*?(\w+)(?=\s+Y)',foo)
>>> m.groups()
('IMP',)
>>>
>>> m = re.search('(?m)(?<=\s)(\w+)(?=\s+Y)',foo)
>>> m.groups()
('IMP',)
>>>

Expected result Is:

('IMP','IMP1')

[re.search Multiple lines Python](https://stackoverflow.com/questions/18521319/re-search-multiple-lines-python) can't be used to close this question as `re.M` is already used in the above code as an inline modifier `(?m)`. Just using `re.findall` won't help either, it will extract `text`, which is not expected. — Wiktor Stribiżew, Sep 17 '20 at 08:56

score 1 · Accepted Answer · answered Sep 16 '20 at 18:59

You can use

\w+(?=[^\S\r\n]+Y\b)

See the regex demo. Details:

\w+ - one or more letters/digits/underscores -(?=[^\S\r\n]+Y\b) - immediately followed with one or more whitespaces other than CR and LF and then Y as a whole word (\b is a word boundary).

See a Python demo:

import re
foo = "this is IMP Y text\nand this is also IMP1 Y text\nthis is not so IMP2 N text\nY is not important"
print(re.findall(r'\w+(?=[^\S\r\n]+Y\b)', foo))
# => ['IMP', 'IMP1']

@monk `re.search` finds the first match only. – Wiktor Stribiżew Sep 16 '20 at 19:06 — Wiktor Stribiżew, Sep 16 '20 at 19:06

score 0 · Answer 2 · answered Sep 16 '20 at 19:06

0

Try using:

(\w+)(?=.Y)

You can test here

So, complete code would be:

import re

a="""this is IMP Y text
and this is also IMP1 Y text
this is not so IMP2 N text
Y is not important"""


print (re.findall(r"(\w+)(?=.Y)",a))

Output:

['IMP', 'IMP1']

answered Sep 16 '20 at 19:06

Homer

424
3
7

extracting word before character

2 Answers2