4

Although similar questions were asked on here multiple times already, I've got request to amend an existing regex line to improve it. Pretty sure this will help others in the same situation too.

What I'm trying to achieve is to match valid YouTube video URLs using ColdFusion regex.

Here's what I've currently got:

ReMatch('^.*(youtu.be\/|v\/|u\/\w\/|embed\/|watch\?v=|\&v=)([^##\&\?]*).*',mylink)

This works for the following URL types:

http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/QdK8U-VIH_o
http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://www.youtube.com/embed/0zM3nApSvMg?rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg

However, the following URL for whatever reason is getting matched too:

http://www.theguardian.com/media/2013/nov/29/russell-brand-rages-sun-rupert-murdoch

How can I amend the code to be a bit more accurate? Maybe making sure that the 'youtu' part is paramount to the link would help as I think the current regex only takes it as one of the optional parts? Trouble is I'm not able to amend this code myself, hence asking for help here.

//////EDITED////////////////

Thanks to Omega's answer below, with a little amendment here's the pattern that worked for my case:

ReMatch('(http:\/\/)(?:www\.)?youtu(?:be\.com\/(?:watch\?|user\/|v\/|embed\/)\S+|\.be\/\S+)',mylink)

Also, it is worth noting I had to strip the lookbehind part from the suggested pattern as ColdFusion does not support it.

SimonDau
  • 425
  • 4
  • 8
  • Use the debugger @ regex101 to find out why. Visit this link: http://regex101.com/r/lL6iP1/#debugger -- looking at the output you'll see the `v/` is what matches and the result is true. This then tells you that your regex is poorly constructed because you have branches that match things unrelated to youtube links. – Firas Dib Nov 29 '13 at 16:20
  • Instead of trying to reinvent the wheel, use one of the patterns at http://stackoverflow.com/questions/2936467/parse-youtube-video-id-using-preg-match#6382259 or http://stackoverflow.com/questions/5830387/how-to-find-all-youtube-video-ids-in-a-string-using-a-regex#5831191 or [plenty of others](http://stackoverflow.com/questions/tagged/youtube+regex?sort=votes&pageSize=30). – Peter Boughton Nov 29 '13 at 16:38
  • @PeterBoughton, there's a reason why I've written ColdFusion in bold. I've already tried those links and all other ones I could possibly find on the subject, trying to make it work. However since ColdFusion uses some stripped down version of Regex none of those worked so far. Most of those would work flawlessly on PHP or JavaScript. – SimonDau Nov 29 '13 at 17:06
  • The polite thing to do now is upvote the helpful answer and mark it as the answer. – Dan Bracuk Nov 29 '13 at 17:23
  • Don't write things in bold and assume people will infer it to mean "I've tried X other solutions" - explicitly state the things you have tried. – Peter Boughton Nov 29 '13 at 17:25
  • 1
    CF's regex implementation has more features than JS, so anything written for JS will work in CF, (assuming you're not including the wrapping `/` characters, which are not part of the regex). Similarly, the top-voted answers for those two PHP links will also work with CF, once you remove the wrapping characters and fix the flags (i.e. `%pattern%i` becomes`(?i)pattern` or `~pattern~ix` becomes `(?ix)pattern`). – Peter Boughton Nov 29 '13 at 17:25
  • @PeterBoughton Sorry, did not try to make it an attack like statement, just explained my position. Also, thanks for the suggestions of adjusting the patterns to work in ColdFusion, will definitely use those in the future. – SimonDau Nov 29 '13 at 17:33

1 Answers1

4
(?<=http:\/\/)(?:www\.)?youtu(?:be\.com\/(?:watch\?|user\/|v\/|embed\/)\S+|\.be\/\S+)

See this demo.

Regular expression visualization

Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • Hey, thanks for this! Little note: ColdFusion does not support lookahead or lookbehind functions. However once I've stripped the ?<= part from the front of the pattern it all worked out. Not sure what damage that did to the entire integrity of the pattern and what additional URLs will be matched now, but for those I've listed in my question - it works perfectly. – SimonDau Nov 29 '13 at 17:09
  • @SimonDau - That is fine, just remove `(?<=http:\/\/)` or replace it with `\bhttps?:\/\/` if you want to include protocol prefix in url match. – Ωmega Nov 29 '13 at 17:21
  • 1
    CF does support lookahead - it's only lookbehind that is missing. You can dip into java.util.regex classes when you need to use lookbehinds. – Peter Boughton Nov 29 '13 at 17:27
  • Also, even when lookbehind is supported, there is no need/benefit to it here - either include the protocol or ignore it; you don't need to confirm it exists without including it. (And it's worth remembering that `//youtube.com/etc` is a valid protocol-agnostic URL.) – Peter Boughton Nov 29 '13 at 17:29