I'm trying to get the domain of a given URL. For example http://www.facebook.com/someuser/
will return facebook.com
. The given URL can be on these formats:
https://www.facebook.com/someuser
(www. is optional, but should be ignored)www.facebook.com/someuser
(http:// is not required)facebook.com/someuser
http://someuser.tumblr.com
-> this has to returntumblr.com
only
I wrote this regex:
/(?: \.|\/{2})(?: www\.)?([^\/]*)/i
But it does not work as I expect.
I can do this in parts:
- Remove
http://
andhttps://
, if present on string, withstring.delete "/https?:\/\//i"
. - Remove
www.
withstring.delete "/www\./i"
. - Get the domain with match and
/(\w+\.\w+)+/i
But this won't work with subdomains. String for testing:
https://www.facebook.com/username
http://last.fm/user/username
www.google.com
facebook.com/username
http://sub.tumblr.com/
sub.tumblr.com
I need this to work with the minimum memory and processing coast as possible.
Any ideas?