2

The following regular expression is jumping [url] tags...

Regular Expression (generic regular expression)

(?:\[url.*?\])(.*?youtu.*?)(?:\[\/url\])

String:

[url]blahyoutubeblah[/url] heyya [url]blahblah[/url]    [url]www.youtube.com/blah[/url]

Help!!

enter image description here

Paolo
  • 21,270
  • 6
  • 38
  • 69
zdanman
  • 508
  • 1
  • 3
  • 13

4 Answers4

2

Your captured group requires youtu inside, so the substring

[url]blahblah[/url]    [url]www.youtube.com/blah[/url]

matches, because it starts with [url], includes youtu, and ends with [/url].

Simply using a negated character set, excluding [, probably isn't enough, because that wouldn't allow for nested tags to match, such as an input of

[url]foobar youtube[b]BOLD TEXT[/b][/url]

You might require negative lookahead for [/url] right before each repeated character:

(?:(?!\[\/url\]).)*

Also, make sure that whatever comes after the [url does not contain ]s before coming to the true ], with:

\[url[^]]*\]

In full:

\[url[^]]*\]((?:(?!\[\/url\]).)*youtu(?:(?!\[\/url\]).)*)\[\/url\]

There's no need to make the quantifiers lazy anymore, because of the negative lookahead.

Demo:

https://regex101.com/r/hSAJEp/1

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • you're awesome. why require a negative lookahead for [/url]? before repeated characters? – zdanman Sep 09 '18 at 23:35
  • 1
    Because otherwise, the substring at the top of the answer (`[url]blahblah[/url] [url]www.youtube.com/blah[/url]`) will match. You don't want any `[/url]`s to be in between the first `[url]` and the final `[/url]`. – CertainPerformance Sep 09 '18 at 23:41
1

You are matching .* which means it will match url, up until youtu, then find /url

A simple workaround could be something like which means it won't match a opening [ bracket before finding youtu

(?:\[url.*?\])([^\[]*?youtu.*?)(?:\[\/url\])
Paolo
  • 21,270
  • 6
  • 38
  • 69
Keith Nicholas
  • 43,549
  • 15
  • 93
  • 156
1

It's lazy, but it still will match if it can - it won't be moving left border if match is possible. There are other things to do that. One of them is just to prevent unwanted match by regex itself - just use

(?:\[url[^\]]*?\])([^\[]*?youtu.*?)(?:\[\/url\])
Qwertiy
  • 19,681
  • 15
  • 61
  • 128
  • you're awesome. this site is awesome. thanks! – zdanman Sep 09 '18 at 23:35
  • 1
    @zdanman, if it solves your problem, accept an answer by clicking a check on the left of it. – Qwertiy Sep 09 '18 at 23:37
  • THANKS Qwertiy I had to end up using the one by CertainPerformance because it allows other tags ( such as **[B][/B]**, **[U][/U]** ) to be included in the group result - and specifically targets the **[URL][/URL]** tag to be excluded. I so very appreciate your help though! Truly! – zdanman Sep 10 '18 at 00:35
1

The problem was that there is youtu you had in your regex but there was blahblah between url to be matched, making it generic so

(?:\[url.*?\])(.*?)(?:\[\/url\])
The Scientific Method
  • 2,374
  • 2
  • 14
  • 25