Match the last occurence of a name from a list before a quoted text

Question

I am trying to get the quotes and their respective authors in a long text.

Example : Paul […] Jane says G_quoted text_R

How can I get Jane and her quoted text in two groups but not Paul etc.

I tried some positive lookahead like this but I get all names, not just Jane. Many thank for your help.

i?(Paul|Jane|Robert|John)(?=[^.]*?G_(.*)_R)

https://regex101.com/r/mx0JgV/1

Why lookahead? Are you required to only consume text up to "Jane" and no further, or "Jane" must be the match of the entire regex and not of a group, or some other weird requirement? — ivan_pozdeev, May 30 '17 at 15:29
I can't understand well... If you just need "Jane", why do you add "Paul" and other names? And why your quoted text is not enclosed by (") but "G_" and "_R"? — Sraw, May 30 '17 at 15:31
I want to get all quotes from the listed authors. In this example, it is Jane but it will be Paul, Robert etc. in other parts of text. "G_" and "_R" are initilaly html tags and but I converted to text — user3259111, May 30 '17 at 15:39
@ivan_pozdeev : I am not sure to undersand your question. I need to get all quotes and the names of their authors. Authors are always the name closest to the quote. Thanks. — user3259111, May 30 '17 at 15:45
Interesting. Lookbehind can't be used because Python's engine, like PCRE, [requires it to be of fixed width](https://stackoverflow.com/questions/3796436/whats-the-technical-reason-for-lookbehind-assertion-must-be-fixed-length-in-r). — ivan_pozdeev, May 30 '17 at 16:53

score 0 · Accepted Answer · answered May 30 '17 at 16:49

0

What's wrong with:

import re

QUOTE_FINDER = re.compile(r"(paul|jane|robert|john).*?G_(.*?)_R", re.IGNORECASE | re.DOTALL)

data = """dfdsf Jane […] Paul […] Jane says G_quoted text_R
and Paul says G_some other text_R while Robert prefers to say G_nothing_R..."""

quotes = QUOTE_FINDER.findall(data)
# [('Jane', 'quoted text'), ('Paul', 'some other text'), ('Robert', 'nothing')]

answered May 30 '17 at 16:49

zwer

24,943
3
48
66

Many thanks @zwer ! This is exactly what I was looking for. – user3259111 May 30 '17 at 17:00

Match the last occurence of a name from a list before a quoted text

1 Answers1