1

With curl, I do the following to check if a webpage is online:

if curl --output /dev/null --silent --head --fail "${url}"; then
  echo "URL exists: ${url}"
else
  echo "URL does not exist: ${url}"
fi

However, if the server refuses HEAD requests (or I don’t know), an alternative is to request only the first byte of the file:

if curl --output /dev/null --silent --fail --range 0-0 "${url}"; then
  echo "URL exists: ${url}"
else
  echo "URL does not exist: ${url}"
fi

The first case is easy to replicate in Ruby:

require 'net/http'
require 'uri'

uri = URI.parse(url)
response = Net::HTTP.get_response(uri)

if response.kind_of? Net::HTTPOK
  puts "URL exists: #{url}"
else
  puts "URL does not exist: #{url}"
end

How do I replicate the second curl case?

user137369
  • 5,219
  • 5
  • 31
  • 54
  • 2
    You can use something like 'curb' (requires libcurl) but nutshell is https://stackoverflow.com/q/82349/438992 – Dave Newton Dec 09 '19 at 20:19
  • 1
    `kind_of?` compares the class of the object, not the object itself. Consider: `case (response)` and then `when Net::HTTPOK` which covers off more cases. – tadman Dec 09 '19 at 20:25
  • @tadman Can you expand on that? I took the `response.kind_of?` code from https://stackoverflow.com/a/12023273/1661012 – user137369 Dec 09 '19 at 21:25
  • It's an uglier way of doing what's in the answer below that by tantrix. – tadman Dec 09 '19 at 23:31
  • @tadman “Ugly” is relative. `kind_of?` allows checking in one line: `puts "URL exists: #{url}" if response.kind_of? Net::HTTPOK`. – user137369 Dec 10 '19 at 02:21
  • How exactly do you define that a _website is online_? Must it respond with a 200 or a 2xx status code? What about a blank `204 No Content`? Or must it include a body? What about a website that responds with a redirect? I would argue that _online_ must be defined first. – spickermann Dec 10 '19 at 07:45
  • That's also uglier in a way that [Rubocop](https://rubocop.readthedocs.io/en/stable/) will flag. Honestly `Net::HTTP` is pretty trash and you should avoid it unless you're aiming for absolutely minimal dependencies, such as inside a published gem. In every other case tools like [Faraday](https://github.com/lostisland/faraday) work much better. – tadman Dec 10 '19 at 19:11

1 Answers1

2

The range option essentially only sets the Range header. So to replicate you would do the same:

url = URI.parse(...)

req = Net::HTTP::Get.new(url.request_uri)
req['Range'] = 'bytes=0-0'

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = (url.scheme == "https")

response = http.request(req)
Casper
  • 33,403
  • 4
  • 84
  • 79
  • Could this be made shorter, I wonder? In the bash example, doing this is just an extra flag. But in Ruby that’s four extra lines, including creating two extra variables. What’s the shortest this could be, while still correct? – user137369 Dec 09 '19 at 21:51
  • 1
    With Net::HTTP there is no shorter way. I looked at the documentation and this is the most compact form I could find. But if you use another gem, then you can likely make it shorter. HTTParty or Faraday are a couple of gems that can probably do this with 1-2 lines. – Casper Dec 09 '19 at 22:00
  • But nothing of course prevents you from writing your own method wrapper around this code. Then you can do it in one line. – Casper Dec 09 '19 at 22:02