2

I have the following regex from here: https://stackoverflow.com/a/10405818/924999

val regex = """/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube\.com(?:\/embed\/|\/v\/|\/watch\?v=|\/ytscreeningroom\?v=|\/feeds\/api\/videos\/|\/user\S*[^\w\-\s]|\S*[^\w\-\s]))([\w\-]{11})[?=&+%\w-]*/ig;""".r

I'm attempting to extract the video ID from youtube video urls with:

val url = "http://www.youtube.com/watch?v=XrivBjlv6Mw"

url match {

    case regex(result) => result

    case _ => null

}

However it seems to always return null, is there something I'm missing or need to do differently?

Thanks in advance for any help, much appreciated :)

Community
  • 1
  • 1
jahilldev
  • 3,520
  • 4
  • 35
  • 52
  • 7
    Please don't ever use `null` in Scala. If the video ID legitimately can or can't exist, then it should be an `Option[String]`, returning `Some(result)` and `None`. If the regex failing is always a hard error, then throw an exception in the default case (or use `Either` if you want to be very functional about it). – Andrzej Doyle Jul 11 '12 at 10:55
  • Thanks for the tip, do you have any thoughts on why the regex match would'nt be returning a result? – jahilldev Jul 11 '12 at 11:14
  • Afraid not; debugging a 192-char regex which has neither comments nor an explanatory breakdown isn't my cup of tea. Since the output is just a boolean (i.e. "no match"), the only way to approach it is to break the regex down into smaller sections until you find why it's failing - which is mainly just *work*, and doesn't require much knowledge/insight as such. So no thanks. – Andrzej Doyle Jul 11 '12 at 12:04

3 Answers3

4

The regex that you have is php-style regex, not java-style - for example, note /ig; flags at the end.

So you'll just have to edit it a bit:

val youtubeRgx = """https?://(?:[0-9a-zA-Z-]+\.)?(?:youtu\.be/|youtube\.com\S*[^\w\-\s])([\w \-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:[\'"][^<>]*>|</a>))[?=&+%\w-]*""".r

I tested it on all possible youtube urls, and it works. Example:

scala> youtubeRgx.pattern.matcher("http://www.youtube.com/watch?v=XrivBjlv6Mw").matches
res23: Boolean = true

And extracting the value:

"http://www.youtube.com/watch?v=XrivBjlv6Mw" match {
  case youtubeRgx(a) => Some(a) 
  case _ => None 
}
res33: Option[String] = Some(XrivBjlv6Mw)

It's a pity that java does not allow proper comments in regexps, so I did what I could:

val youtubeRgx = """https?://         # Required scheme. Either http or https.
                   |(?:[0-9a-zA-Z-]+\.)? # Optional subdomain.
                   |(?:               # Group host alternatives.
                   |  youtu\.be/      # Either youtu.be,
                   || youtube\.com    # or youtube.com followed by
                   |  \S*             # Allow anything up to VIDEO_ID,
                   |  [^\w\-\s]       # but char before ID is non-ID char.
                   |)                 # End host alternatives.
                   |([\w\-]{11})      # $1: VIDEO_ID is exactly 11 chars.
                   |(?=[^\w\-]|$)     # Assert next char is non-ID or EOS.
                   |(?!               # Assert URL is not pre-linked.
                   |  [?=&+%\w]*      # Allow URL (query) remainder.
                   |  (?:             # Group pre-linked alternatives.
                   |    [\'"][^<>]*>  # Either inside a start tag,
                   |  | </a>          # or inside <a> element text contents.
                   |  )               # End recognized pre-linked alts.
                   |)                 # End negative lookahead assertion.
                   |[?=&+%\w-]*       # Consume any URL (query) remainder.
                   |""".stripMargin.replaceAll("\\s*#.*\n", "").replace(" ","").r

(adapted from @ridgerunner's answer here: find all youtube video ids in string)

Community
  • 1
  • 1
Rogach
  • 26,050
  • 21
  • 93
  • 172
0

A much simpler approach:

scala> val url = "http://www.youtube.com/watch?v=XrivBjlv6Mw"
url: java.lang.String = http://www.youtube.com/watch?v=XrivBjlv6Mw

scala> val regex = "v=[\\w]*"r
regex: scala.util.matching.Regex = v=[\w]*

scala> for (x <-  regex findFirstIn url) yield x.replace("v=","")
res3: Option[java.lang.String] = Some(XrivBjlv6Mw)
pedrofurla
  • 12,763
  • 1
  • 38
  • 49
  • 1
    As per http://stackoverflow.com/questions/5830387/php-regex-find-all-youtube-video-ids-in-string/10405818#10405818, this wouldn't catch most ids. – Rogach Jul 11 '12 at 12:00
0

First Scala accepts Java style regex. If you supply slashes, then they are part of the pattern, not delimiter. Also the flags must be specified per group, not after the regex.

Second, for case regex(result) to match, you have to define a matching group in your pattern. You have to group the video ID as matching pattern - I just don't know if that's the case in the overly complex pattern.

Arne
  • 2,106
  • 12
  • 9