0

I am trying to fetch the last occurrence of the patterns where the string contains similar patterns in between. E.g: my string is : "abc abc abc efg 123 abc 123 abc abc xyz 123" I want to capture the pattern between abc and 123. My desired output is: ['abc efg 123', 'abc 123', 'abc xyz 123']

So I used regex 'abc.*?123'. But this is giving as follows: ['abc abc abc efg 123', 'abc 123', 'abc abc xyz 123']

I don't want first occurrence of first pattern to second pattern, I need last occurrence of first pattern to second pattern

import re
a="abc abc abc efg 123 abc 123 abc abc xyz 123"
print a
b=re.findall(r'abc.*?123',a)
print "Output is: "+str(b)

Output is: ['abc abc abc efg 123', 'abc 123', 'abc abc xyz 123']

I expect the output as: ['abc efg 123', 'abc 123', 'abc xyz 123']

depperm
  • 10,606
  • 4
  • 43
  • 67
Phani
  • 11
  • 3
  • What do you expect for `abc efg abc 123` Perhaps like this https://regex101.com/r/PXU3dG/1 – The fourth bird Aug 14 '19 at 13:25
  • Do you mean that you want to capture non-repeating patterns? You can easily split on spaces and remove repeated entries in Python. – MonkeyZeus Aug 14 '19 at 13:29
  • **Duplicate of [Find shortest matches between two strings](https://stackoverflow.com/questions/24640154/find-shortest-matches-between-two-strings)** – Wiktor Stribiżew Aug 14 '19 at 19:50
  • The duplicate refers to using a Tempered Greedy Token which will not allow to match abc efg abc 123. – The fourth bird Aug 14 '19 at 20:43
  • The TGT is exactly what OP asks for in the question: *I need **last occurrence** of first pattern to second pattern*. If the input is `abc efg abc 123`, the output must be `abc 123`. Please re-close. – Wiktor Stribiżew Aug 15 '19 at 20:25
  • The question starts with `fetch last occurrence of the patterns where the string contains similar patterns in between.` Due to the `the similar patterns in between` part I asked the OP if `abc efg abc 123` can be a match and the reply in the comment listed under the [answer](https://stackoverflow.com/a/57495751/5424988) is `Yes, 'abc efg abc 123' can also be a match.` The TGT will not match that. – The fourth bird Aug 16 '19 at 09:46

1 Answers1

1

To match the last occurrence of abc in abc abc abc, you could use a negative lookahead (?! abc) to assert abc is not followed by a space and abc. Use word boundaries \b to prevent abc being part of a larger word.

If there can be a single occurrence of abc after efg you might use:

\babc\b(?! abc\b).*?\b123\b

Regex demo | Python demo

import re
a="abc abc abc efg 123 abc 123 abc abc xyz 123"
b=re.findall(r"\babc\b(?! abc\b).*?\b123\b",a)
print (b)

Result

['abc efg 123', 'abc 123', 'abc xyz 123']

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Albeit impressive, it relies on hardcoded patterns. I am going to assume OP is looking for a generic solution which can be used on any string. – MonkeyZeus Aug 14 '19 at 13:32
  • @MonkeyZeus The example data as well as the pattern contains `abc` and `123`. A more generic pattern could be `\b(\w+)\b(?! \1).*?123` https://regex101.com/r/x3qXal/1 or with `\S+` instead of `\w+` – The fourth bird Aug 14 '19 at 13:33
  • Thank you. This is working. My actual case is even little more complicated where I need to capture multiple patterns in single regex. Let me try more and come back. Once again thanks a lot for this as this will resolve many of my issues. Great..!!!! – Phani Aug 14 '19 at 15:20
  • @mangipudiphanikishore Can this also be a match? `abc efg abc 123` – The fourth bird Aug 14 '19 at 15:21
  • @Thefourthbird Yes, 'abc efg abc 123' can also be a match. Thinking how can we make an expression if there are more than 2 patterns. E.g pattern1 might be abc, pattern2 might be 'efg' and pattern3 might be '123'. Can we write in one regexp? – Phani Aug 14 '19 at 15:46
  • @Phani Do you mean like this using capturing group and a backreference https://regex101.com/r/MpzAiJ/1 – The fourth bird Aug 14 '19 at 15:51
  • 1
    @Thefourthbird Yes, the above is even more close to my solution. I just updated like this : (abc(?! abc).*?efg)(?! abc|efg).*?123 Because 'abc' followed by 'efg' followed by '123' followed by so on... – Phani Aug 14 '19 at 16:09
  • @Phani Ok, I will remove the duplicate marking as it refers to using a [Tempered Greedy Token](https://stackoverflow.com/questions/30900794/tempered-greedy-token-what-is-different-about-placing-the-dot-before-the-negat/37343088#37343088) which will not allow to [match](https://regex101.com/r/okiJSK/1) `abc efg abc 123` – The fourth bird Aug 14 '19 at 16:24
  • 1
    @Thefourthbird Got it. Thank you for the solutions. Working perfect – Phani Aug 16 '19 at 09:15