0

I check my websites articles for youtube links and auto-convert them into youtube html.

The problem is if someone wants to just link to a youtube url, the url will get parse with a link around it giving broken html.

This is using a BBCode parser I created using this for urls:

[url=address]text[/url]

This is the current regex:

~(?:http|https|)(?::\/\/|)(?:www.|)(?:youtu\.be\/|youtube\.com(?:\/embed\/|\/v\/|\/watch\?v=|\/ytscreeningroom\?v=|\/feeds\/api\/videos\/|\/user\S*[^\w\-\s]|\S*[^\w\-\s]))([\w\-]{11})[a-z0-9;:@#?&%=+\/\$_.-]*~i

So, I tried adding in this to the start:

(?<!\[url=)

To look like:

~(?<!\[url=)(?:http|https|)(?::\/\/|)(?:www.|)(?:youtu\.be\/|youtube\.com(?:\/embed\/|\/v\/|\/watch\?v=|\/ytscreeningroom\?v=|\/feeds\/api\/videos\/|\/user\S*[^\w\-\s]|\S*[^\w\-\s]))([\w\-]{11})[a-z0-9;:@#?&%=+\/\$_.-]*~i

So that if it detected the url bbcode section right before it, to not parse it into youtube html, but it seems to not work.

It will act as if my negative lookbehind isn't there, and will process the youtube url as normal.

This is the url in question:

[url=https://www.youtube.com/watch?v=jHnvVX_T1AA]

So it should not pick that up, since it is preceded by the url bbcode.

What am I doing wrong?

NaughtySquid
  • 1,947
  • 3
  • 29
  • 44
  • I think you can use [`(?:https?|)(?::\/\/|)(?:www.|)(?<!\[url=http:\/\/www\.|\[url=https:\/\/www\.|\[url=www\.|\[url=)(?:youtu\.be\/|youtube\.com(?:\/embed\/|\/v\/|\/watch\?v=|\/ytscreeningroom\?v=|\/feeds\/api\/videos\/|\/user\S*[^\w\-\s]|\S*[^\w\-\s]))([\w\-]{11})[a-z0-9;:@#?&%=+\/\$_.-]*`](https://regex101.com/r/sP6oQ6/2). – Wiktor Stribiżew Sep 14 '15 at 22:31

1 Answers1

1

To summarize, you have a string like: zabcd and a pattern like: (?<!z)(?:ab)?cd

The pattern will fail at the position of "a" because of the lookbehind, but since ab is optional, the pattern succeeds at the position of "c" (that is not preceded by a "z").

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • I don't quite understand what you're saying. The regex will work, but it works as if my negative lookbehind isn't there. – NaughtySquid Sep 14 '15 at 14:56
  • @LiamDawe: Starts to well understand this simple example, and after take a (new) look at your pattern, you will understand why it doesn't work. (since it's exactly what happens) – Casimir et Hippolyte Sep 14 '15 at 15:00
  • I have updated my question, since I don't think you're quite getting it. The https bit does exist. – NaughtySquid Sep 14 '15 at 15:03
  • @LiamDawe: ... as the "ab" exists in my example string. – Casimir et Hippolyte Sep 14 '15 at 15:05
  • So what part in my regex is being captured then to make it carry on as normal? Can you give an example of how to solve it? – NaughtySquid Sep 14 '15 at 15:06
  • @LiamDawe: for your example string with your current pattern, the capture starts with `www.youtube....`. About how to solve the problem, the idea is to consume characters (in other words, match them) of parts enclosed between `[url]` tags *before* trying to match the urls you want. There are a lot of questions on SO about how to skip/ignore parts of a string to find a pattern. – Casimir et Hippolyte Sep 14 '15 at 15:14
  • But it is not enclosed between [url] and [/url] tags, it's include as the url for the tag [url=youtubeurl, i don't want to skip or ignore it, I want it to not run if the [url= part exists? – NaughtySquid Sep 14 '15 at 15:18
  • @LiamDawe: between `[url=` and `]`, it doesn't change anything. And as I said, the way to go *is* to match `[url=...]` *before*. – Casimir et Hippolyte Sep 14 '15 at 15:26
  • I really don't get what that would achieve, have you got an example? – NaughtySquid Sep 14 '15 at 15:28
  • @LiamDawe: you can take a look at this post: http://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex/24535912#24535912 – Casimir et Hippolyte Sep 14 '15 at 15:29