4

I'm creating an API service which allows people to provide a URL of an image to the API call, and the the service downloads the image to process.

How do I ensure somebody does NOT give me the URL of, like, a 5MB image? Is there a way to limit the request?

This is what I have so far, which basically grabs everything.

  req = Net::HTTP::Get.new(url.path)
  res = Net::HTTP.start(url.host, url.port) { |http|
    http.request(req)
  }

Thanks, Conrad

Svante
  • 50,694
  • 11
  • 78
  • 122
chuboy
  • 619
  • 7
  • 13

4 Answers4

7

cwninja unfortunately gave you an answer that will only work for accidental attacks. An intelligent attacker will have no trouble at all defeating that check. There are two main reasons his method should not be used. First, nothing guarantees that the information in a HEAD response will match the corresponding GET response. A properly behaving server certainly will do this, but a malicious actor does not have to follow the spec. The attacker could simply send a HEAD response that says it has a Content-Length that's less than your threshold, but then hand you a huge file in the GET response. But that doesn't even cover the potential for a server to send back a response with the Transfer-Encoding: chunked header set. A chunked response could quite possibly never end. A few people pointing your server at never-ending responses could carry out a trivial resource-exhaustion attack, even if your HTTP client enforces a timeout.

To do this correctly, you need to use an HTTP library that allows you to count the bytes as they're received, and abort if it crosses the threshold. I would probably recommend Curb for this rather than Net::HTTP. (Can you even do this at all with Net::HTTP?) If you use the on_body and/or on_progress callbacks, you can count the incoming bytes and abort mid-response if you receive a file that's too large. Obviously, as cwninja already pointed out, if you receive a Content-Length header larger than your threshold, you want to abort for that too. Curb is also notably faster than Net::HTTP.

Community
  • 1
  • 1
Bob Aman
  • 32,839
  • 9
  • 71
  • 95
  • The simplest way of DOSing would be to simply enter the URL for an extremely slow responding page (or back to the site itself, causing a feedback loop of DOS)? Timeouts should take care of this (as best it can), and the same logic can be used for both this problem and the DOS element of the large response size issue. HEAD requests give validation, not protection. – cwninja Oct 11 '09 at 17:42
  • Requests back to the site itself will usually fail unless it's a fully asynchronous environment with no chance of blocking. But that's easy enough to check for: Just don't allow requests to your own hostname. – Bob Aman Oct 11 '09 at 17:52
  • Blacklists will almost always lead to fail. What about http://someproxyservice/?q=http://yoursite.com? – cwninja Oct 11 '09 at 17:58
  • Oh, and multiple requests to the host queue up (hanging) until a mongrel or worker process is free to deal with it. Once you max out the workers, the host will behave as a very slow server. – cwninja Oct 11 '09 at 18:02
  • Yeah, that's true; there's a heck of a lot of ways to express localhost. Finite, but still many: localhost, 127.0.0.1, ::1, fe80::1%lo0, example.com, www.example.com. Probably more. The point, however, isn't to make a bullet-proof blacklist; the point is to make timeouts be your last line of defense, since they're your least desirable means of preventing malicious behavior. – Bob Aman Oct 11 '09 at 18:35
  • By way of explanation on why you don't want to rely on timeouts when trying to defeat a malicious actor: Your typical Mongrel installation will have between 5 and 30 mongrel instances per machine, depending on how beefy the machine is. The default timeout for Net::HTTP is 60 seconds. Let's assume you're smart, and you set that to more like 15 seconds. I now have to send somewhere around 2 requests per second per machine to cause all requests to block. You really don't want to design a service that can be brought down with a single attacking machine. – Bob Aman Oct 11 '09 at 18:59
  • Indeed, but my counter point would be that you can't make this bulletproof (without some serious engineering). Lets say it takes 2 seconds to do the full cycle in normal circumstances. It still only takes around 15 req/second (with 30 mongrels per box) to DOS the box. This still sucks. But we use rails, and mongrels in fixed sized pools – so we are used to this. – My point being that when tasked with polishing a turd(of-a-situation), don't bother with your best polish and duster. Just hit it with the power hose and call it a day. – cwninja Oct 11 '09 at 19:49
  • Fair enough. I think we essentially agree. – Bob Aman Oct 11 '09 at 20:21
  • Indeed. Fun debate though, we should do this more often. – cwninja Oct 11 '09 at 21:16
2

Try running this first:

Net::HTTP.start(url.host, url.port) { |http|
  response = http.request_head(url.path)
  raise "File too big." if response['content-length'].to_i > 5*1024*1024
}

You still have a race condition (someone could swap out the file after you do the HEAD request), but in the simple case this asks the server for the headers it would send back from a GET request.

cwninja
  • 9,550
  • 1
  • 29
  • 22
  • 1
    This won't work with HTTP/1.1 chunked encoding or HTTO/1.0 Connection: close and no Content-Length. – panzi Sep 11 '13 at 15:43
2

Another one way to limit downloading size (full code should check response status, exception handling etc. It's just an example)

Net::HTTP.start(uri.host, uri.port) do |http|
  request = Net::HTTP::Get.new uri.request_uri
  http.request request do |response|
# check response codes here
      body=''
      response.read_body do |chunk|
           body += chunk
           break if body.size > MY_SAFE_SIZE_LIMIT
      end
      break
  end
end
2

Combining the other two answers, I'd like to 1) check the size header, 2) watch the size of chunks, while also 3) supporting https and 4) aggressively enforcing a timeout. Here's a helper I came up with:

require "net/http"
require 'uri'

module FetchUtil
  # Fetch a URL, with a given max bytes, and a given timeout
  def self.fetch_url url, timeout_sec=5, max_bytes=5*1024*1024
    uri = URI.parse(url)

    t0 = Time.now.to_f
    body = ''
    Net::HTTP.start(uri.host, uri.port,
               :use_ssl => (uri.scheme == 'https'),
               :open_timeout => timeout_sec,
               :read_timeout => timeout_sec) { |http|

      # First make a HEAD request and check the content-length
      check_res = http.request_head(uri.path)
      raise "File too big" if check_res['content-length'].to_i > max_bytes

      # Then fetch in chunks and bail on either timeout or max_bytes
      # (Note: timeout won't work unless bytes are streaming in...)
      http.request_get(uri.path) do |res|
        res.read_body do |chunk|
          raise "Timeout error" if (Time.now().to_f-t0 > timeout_sec)
          raise "Filesize exceeded" if (body.length+chunk.length > max_bytes)
          body += chunk
        end
      end
    }
    return body
  end
end
Jeff Ward
  • 16,563
  • 6
  • 48
  • 57
  • 1
    You should set `open_timeout` in `Net::HTTP.start` options if you want apply it also for TCP open timeout. E.g.: `Net::HTTP.start(uri.host, uri.port, use_ssl: (uri.scheme == 'https'), open_timeout: timeout_sec)` – SergA Jun 06 '19 at 20:24
  • Thanks @SergA - I've moved the open and read timeout specifications into the `start` call, and verified with a debugger that they are now being applied before the `TCPSocket.open` in `http.rb` – Jeff Ward Jun 07 '19 at 15:35