18

I need to match the string "foo" from a string with this format:

string = "/foo/boo/poo"

I tied this code:

poo = "poo"
foo = re.match('.*(?=/' + re.escape(poo) + ')', string).group(0)

and it gives me /foo/boo as the content of the variable foo (instead of just foo/boo).

I tried this code:

poo = "poo"
foo = re.match('(?=/).*(?=/' + re.escape(poo) + ')', string).group(0)

and I'm getting the same output (/foo/boo instead of foo/boo).

How can I match only the foo/boo part?

John Ellis
  • 193
  • 1
  • 1
  • 4

2 Answers2

19

Hey try the following regex:

(?<=/).*(?=/poo)
^^^^^^

It will not take into account your first slash in the result.

Tested regex101: https://regex101.com/r/yzMkTg/1

Transform your code in the following way and it should work:

poo = "poo"
foo = re.match('(?<=/).*(?=/' + re.escape(poo) + ')', string).group(0)

Have a quick look at this link for more information about the behavior of Positive lookahead and Positive lookbehind

http://www.rexegg.com/regex-quickstart.html

Allan
  • 12,117
  • 3
  • 27
  • 51
  • 4
    It does not work in Python 3.10. You need to use `re.search` there, not `re.match`. Compare https://stackoverflow.com/questions/46501800/regex-with-lookbehind-not-working-using-re-match – bers Feb 07 '22 at 18:28
14

You are missing a < in your lookbehind!

Lookbehinds look like this:

(?<=...)

not like this:

(?=...)

That would be a lookahead!

So,

(?<=/).*(?=/poo)
Sweeper
  • 213,210
  • 22
  • 193
  • 313