4

Possible Duplicates:
How can I alter this regex to get the Youtube video id from a Youtube URL that doesn't specify the v parameter?
What regex can I use to get the domain name from a url in Ruby?
Improving regex for parsing YouTube / Vimeo URLs

What is the regex to validate that a string is a url to a youtube or vimeo video? I'm not so good with regular expressions. This is for a rails application.

Community
  • 1
  • 1
Justin Meltzer
  • 13,318
  • 32
  • 117
  • 182
  • See http://stackoverflow.com/questions/3903854/how-can-i-alter-this-regex-to-get-the-youtube-video-id-from-a-youtube-url-that-do and http://stackoverflow.com/questions/3214661/vimeo-id-php-regex (PHP, but should be easy enough to translate) – Andrew Grimm Apr 27 '11 at 02:48

3 Answers3

4

For youtube:

yt_regexp = /^http:\/\/www\.youtube\.com\/watch\?v=([a-zA-Z0-9_-]*)/

You get the id of the video also:

>> yt_regexp.match("http://www.youtube.com/watch?v=foo")[1]
=> "foo"

For vimeo:

vimeo_regexp = /^http:\/\/www\.vimeo\.com\/(\d+)/

You can also extract the id using the same as before.

If you want to make "http://www." optional, you can use:

yt_regexp = /^(?:http:\/\/)?(?:www\.)?youtube\.com\/watch\?v=([a-zA-Z0-9_-]*)/
vimeo_regexp = /^(?:http:\/\/)?(?:www\.)?vimeo\.com\/(\d+)/
aromero
  • 25,681
  • 6
  • 57
  • 79
4

A regex is one way to get there, but not what I'd use. I prefer using a URL parser, like the built-in URI or the Addressable::URI gem. URLs can get messy, and, there are multiple ways a site can be designated in a URL that resolve and will connect to a particular host, but fail the usual "check for the host name" test.

require 'uri'
url = 'http://www.youtube.com/watch?v=_NaiiBkqOxE&feature=feedu'

uri = URI.parse(url)
uri.host # => "www.youtube.com"

A couple ways of doing it:

uri.host['youtube.com']         # => "youtube.com"
uri.host =~ /youtube\.com/      # => 4
!!uri.host['youtube.com']       # => true
!!(uri.host =~ /youtube\.com/)  # => true

Usually our needs are more sophisticated, and we want to know what parameters are embedded in the URL, or what the path to the resource is. Split breaks the URL into its component pieces:

URI.split(url) # => ["http", nil, "www.youtube.com", nil, nil, "/watch", nil, "v=_NaiiBkqOxE&feature=feedu", nil]

Each of the pieces has a defined name, so it's common to break the URL down into elements in a hash. You can create a hash of all the parts for fast lookup:

parts = Hash[*[:scheme, :userinfo, :host, :port, :registry, :path, :opaque, :query, :fragment].zip(URI.split(url)).flatten]
parts # => {:scheme=>"http", :userinfo=>nil, :host=>"www.youtube.com", :port=>nil, :registry=>nil, :path=>"/watch", :opaque=>nil, :query=>"v=_NaiiBkqOxE&feature=feedu", :fragment=>nil}

Using Addressable::URI to do the same things:

require 'addressable/uri'
uri = Addressable::URI.parse('http://www.youtube.com/watch?v=_NaiiBkqOxE&feature=feedu')
uri.host # => "www.youtube.com"

parts = uri.to_hash
parts # => {:scheme=>"http", :user=>nil, :password=>nil, :host=>"www.youtube.com", :port=>nil, :path=>"/watch", :query=>"v=_NaiiBkqOxE&feature=feedu", :fragment=>nil}

Wikipedia's page on URL normalization shows a lot of examples of how URLs can vary, yet still point to the same resource. So, if your use is to only match the main domain for a site, then yes, you can use a simple regex, or even a substring search. When you get beyond that need you need to get more sophisticated in how you take the URL apart.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • why would you do this over regex? this seems more complicated... Also just checking for the host might not be sufficient for ensuring a valid youtube link, right? – Justin Meltzer Apr 27 '11 at 06:47
  • URI and/or Addressable::URI can do validations for you to see if the URL is correctly formed, and can even clean the URL if you want it to. – the Tin Man Apr 27 '11 at 06:57
  • The `host` portion of the URL could be correct but the URL still be invalid when requesting something. You have to have the correct parameters, and parameters do not have to be in a particular order. A regex will break down if the order changes. A URL parser will keep on working. – the Tin Man Apr 27 '11 at 06:58
  • OK so lets say you do something like `require 'uri'` `parts = Hash[*[:scheme, :userinfo, :host, :port, :registry, :path, :opaque, :query, :fragment].zip(URI.split(url)).flatten]` Now you have a hash for fast lookup, but where would this code go? How would you hook it up to validations in your models? I'm unsure how to integrate this. – Justin Meltzer Apr 27 '11 at 22:21
  • There's no way I can answer that because it's highly dependent on your application, code, classes/models, etc. – the Tin Man Apr 27 '11 at 23:52
-3

I'm not familiar with vimeo but youtube would be:

"http://www.youtube.com/watch?v=".+

Note the quote marks. You want exactly the format in between them, which is what they tell your regex engine. Otherwise you will get suprised by things like the periods and question mark in the entry. Then you get a random string which finishes off the url.

Spencer Rathbun
  • 14,510
  • 6
  • 54
  • 73