A regex is one way to get there, but not what I'd use. I prefer using a URL parser, like the built-in URI
or the Addressable::URI
gem. URLs can get messy, and, there are multiple ways a site can be designated in a URL that resolve and will connect to a particular host, but fail the usual "check for the host name" test.
require 'uri'
url = 'http://www.youtube.com/watch?v=_NaiiBkqOxE&feature=feedu'
uri = URI.parse(url)
uri.host # => "www.youtube.com"
A couple ways of doing it:
uri.host['youtube.com'] # => "youtube.com"
uri.host =~ /youtube\.com/ # => 4
!!uri.host['youtube.com'] # => true
!!(uri.host =~ /youtube\.com/) # => true
Usually our needs are more sophisticated, and we want to know what parameters are embedded in the URL, or what the path to the resource is. Split breaks the URL into its component pieces:
URI.split(url) # => ["http", nil, "www.youtube.com", nil, nil, "/watch", nil, "v=_NaiiBkqOxE&feature=feedu", nil]
Each of the pieces has a defined name, so it's common to break the URL down into elements in a hash.
You can create a hash of all the parts for fast lookup:
parts = Hash[*[:scheme, :userinfo, :host, :port, :registry, :path, :opaque, :query, :fragment].zip(URI.split(url)).flatten]
parts # => {:scheme=>"http", :userinfo=>nil, :host=>"www.youtube.com", :port=>nil, :registry=>nil, :path=>"/watch", :opaque=>nil, :query=>"v=_NaiiBkqOxE&feature=feedu", :fragment=>nil}
Using Addressable::URI to do the same things:
require 'addressable/uri'
uri = Addressable::URI.parse('http://www.youtube.com/watch?v=_NaiiBkqOxE&feature=feedu')
uri.host # => "www.youtube.com"
parts = uri.to_hash
parts # => {:scheme=>"http", :user=>nil, :password=>nil, :host=>"www.youtube.com", :port=>nil, :path=>"/watch", :query=>"v=_NaiiBkqOxE&feature=feedu", :fragment=>nil}
Wikipedia's page on URL normalization shows a lot of examples of how URLs can vary, yet still point to the same resource. So, if your use is to only match the main domain for a site, then yes, you can use a simple regex, or even a substring search. When you get beyond that need you need to get more sophisticated in how you take the URL apart.