1
import re
s="facebook.com/https://www.facebook.com/test/"
re.findall("facebook\.com/[^?\"\'>&\*\n\r\\\ <]+?", s)

I only want as a result "facebook.com/test/" ... but I'm getting as a result --

facebook.com/h
facebook.com/t

What's wrong with my RE? I applied the "?" at the end of the expression thinking this would stop greediness, but it's being treated as 0 or 1 expression.

If I remove the "?" I get:

facebook.com/https://www.facebook.com/test/
Gary
  • 909
  • 9
  • 26
  • Did you just want to match any instance that's not at the beginning of the string? If so, get rid of the "?" and see [Regex: match pattern as long as it's not in the beginning](https://stackoverflow.com/questions/15669557/regex-match-pattern-as-long-as-its-not-in-the-beginning) – SuperStormer Jul 29 '22 at 01:06
  • It doesn't work. When I do this, I get the entire string. – Gary Jul 29 '22 at 01:06
  • Did you follow the linked post? – SuperStormer Jul 29 '22 at 01:07
  • Yes, thank you for your help, but that post isn't the solution I need. – Gary Jul 29 '22 at 01:09
  • I need the last match in a string, even if it's at the beginning (and consequently the last one). e.g. facebook. com/hello/ – Gary Jul 29 '22 at 01:09

1 Answers1

1

The non-greedy modifier works forwards but not backwards, which means that when the first instance of facebook.com/ matches it will not be discarded unless the rest of the pattern fails to match, even if it's non-greedy.

To match the last instance of facebook.com/ you can use a negative lookahead pattern instead:

facebook\.com/(?!.*facebook\.com/)[^?\"\'>&\*\n\r\\\ <]+

Demo: https://replit.com/@blhsing/WordyAgitatedCallbacks

blhsing
  • 91,368
  • 6
  • 71
  • 106