How to search for the last occurrence of a regular expression in a string in python?

Question

In python, I can easily search for the first occurrence of a regex within a string like this:

import re
re.search("pattern", "target_text")

Now I need to find the last occurrence of the regex in a string, this doesn't seems to be supported by re module.

I can reverse the string to "search for the first occurrence", but I also need to reverse the regex, which is a much harder problem.

I can also iterate to find all occurrences from left to right, and just keep the last one, but that looks awkward.

Is there a smart way to find the rightmost occurrence?

Depending on what you have, you could modify the regex to *find the first instance of the string just before the end of the string*. — npinti, Oct 20 '15 at 09:22
Possible duplicate of [Find last match with python regular expression](https://stackoverflow.com/questions/2802168/find-last-match-with-python-regular-expression) — anthony sottile, Jul 26 '17 at 11:28

score 25 · Accepted Answer · answered Oct 20 '15 at 10:12

25

One approach is to prefix the regex with (?s:.*) and force the engine to try matching at the furthest position and gradually backing off:

re.search("(?s:.*)pattern", "target_text")

Do note that the result of this method may differ from re.findall("pattern", "target_text")[-1], since the findall method searches for non-overlapping matches, and not all substrings which can be matched are included in the result.

For example, executing the regex a.a on abaca, findall would return aba as the only match and select it as the last match, while the code above will return aca as the match.

Yet another alternative is to use regex package, which supports REVERSE matching mode.

The result would be more or less the same as the method with (?s:.*) in re package as described above. However, since I haven't tried the package myself, it's not clear how backreference works in REVERSE mode - the pattern might require modification in such cases.

answered Oct 20 '15 at 10:12

nhahtdh

55,989
15
126
162

`findall` with `lookahead with a group` will get you overlapping matches – vks Oct 20 '15 at 10:45
@vks: If you mean `(?=pattern)` then yes, but I guess it would be less efficient when there are many matches. The `pattern` has to be fully executed per match found. – nhahtdh Oct 20 '15 at 11:18
5

It would have been nice to get an explanation for `(?s:.*)` for those who are not too familiar with regex. If someone is as confused as I was: `(?:)` is a non-capturing group, i.e. the content will not be present in the captured result. The `s` modifier changes the behavior of the dot `.` to capture also newline characters. So `(?s:.*)` matches literally everything (including empty string) but will not be in the result. Due to the greedy nature of regex it first matches as much as it can (everything) and then reduces the length of the group so the match ends in `pattern`. Great answer though! – Joschua Jun 16 '21 at 09:50
In addition: due to the way non-capturing groups work in python, to get the desired match, one needs to group the pattern actually `re.search("(?s:.*)(pattern)", "target_text").group(1)`. See also: https://stackoverflow.com/questions/2703029/why-isnt-the-regular-expressions-non-capturing-group-working – Joschua Jun 16 '21 at 10:01
Check the performance of the `(?s:.*)` method vs re.findall() for any sizeable target text. MUCH SLOWER when a pattern is not found. – sql_knievel Apr 14 '23 at 14:13

vks · Answer 2 · 2015-10-20T09:35:20.357

14

import re
re.search("pattern(?!.*pattern)", "target_text")

or

import re
re.findall("pattern", "target_text")[-1]

You can use these 2 approaches.

If you want positions use

x="abc abc abc"
print [(i.start(),i.end(),i.group()) for i in re.finditer(r"abc",x)][-1]

edited Oct 20 '15 at 09:35

answered Oct 20 '15 at 09:22

vks

67,027
10
91
124

`findall` returns a list of strings without position info, which is what I need. The first solution is smart, but what if the regex is not modifiable (suppose `pattern` is provided as a regex object), is there still a way? – NeoWang Oct 20 '15 at 09:32
Just checked that this solution has a quadratic complexity, @nhahtdh answer is linear from the end, thus very fast if searched expression is near the end. – macieksk Jan 09 '20 at 00:55

score 1 · Answer 3 · answered Sep 15 '20 at 18:23

1

One approach is to use split. For example if you wanted to get the last group after ':' in this sample string:

mystr = 'dafdsaf:ewrewre:cvdsfad:ewrerae'
':'.join(mystr.split(':')[-1:])

answered Sep 15 '20 at 18:23

sparrow

10,794
12
54
74

Not the best solution, but +1 :) It solved another simple issue I had ;) – Marco smdm May 15 '21 at 08:39

How to search for the last occurrence of a regular expression in a string in python?

3 Answers3

Linked

Related