0

I used the following question, Remove "www", "http://" from string, to remove the HTTP and HTTPS from my url. Now I want to remove the path on the end of the url as well and can't seem to get the pattern. This is the code I am trying to use:

str = str.sub(/^https?\:\/\//, '').sub(/^www./,'').sub(/^\/*/,'')

The first two .sub work as expected and remove the HTTP, HTTPS and www. from the url but the path i.e. http://URL/path/to/remove/ stays with the URL. As can be seen above I have tried the following pattern /^\/*/ in the third sub but it doesn't remove the path on the end of the url. What pattern will remove all characters after the base URL?

Community
  • 1
  • 1
ScottOBot
  • 839
  • 3
  • 16
  • 37

1 Answers1

6

You could:

require 'uri'
URI('http://stackoverflow.com/questions/24252071/ruby-editing-urls').host
# => "stackoverflow.com"
ichigolas
  • 7,595
  • 27
  • 50
  • 1
    +1 It's almost always better to use a proper parsing library like this than to wreck around with regular expressions. Might want to amend to `host.sub(/^www\./, '')` as well. – tadman Jun 16 '14 at 21:11
  • URI isn't working when the user inputs a host name that is already in the proper form. i.e. google.com throws an exception when I try to to get the host with the URI method. What regular expression would get rid of any characters after the forward slash? @tadman – ScottOBot Jun 17 '14 at 14:01
  • It needs to be a proper URL, not just a hostname. Are you trying something like `URI('google.com').host`? – tadman Jun 17 '14 at 15:19