3

Using this code in Delphi for getting a web page size: (I mean page source size)

uses
  IdHTTP

function URLsize(const URL : string) : integer;
var
  Http: TIdHTTP;
begin
  Http := TIdHTTP.Create(nil);
  try
    Http.Head(URL);    
    result := round(Http.Response.ContentLength / 1048576);   //MB   
  finally
    Http.Free;
  end;
end;

I can get file size easily for some URLs like http://sample.com/test.exe. It returns the size in MB.

But I cannot get URL size using this code for a URL like http://stackoverflow.com/; it returns 0 or -1.

How can I get the size in that case?

Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
Sky
  • 4,244
  • 7
  • 54
  • 83
  • Related: http://stackoverflow.com/questions/9165926/using-wininet-to-identify-total-file-size-before-downloading-it – Jerry Dodge Aug 22 '13 at 16:56
  • @JerryDodge I tried that. But didn't help. Still can't get the size. Thanks anyway. – Sky Aug 22 '13 at 18:10
  • Of course. The accepted answer to that Q does exactly the same as here. HEAD request followed by read of content-length. – David Heffernan Aug 22 '13 at 18:22
  • 1
    That's why I said related and not duplicate. – Jerry Dodge Aug 22 '13 at 18:35
  • I'm getting a content-length with `http://stackoverflow.com/`. – Marcus Adams Aug 22 '13 at 18:57
  • 7
    When I test `http://stackoverflow.com/` using your exact function, I get 0 - but without dividing it I get 194569, which is smaller than 1048576. Could this be your problem? – Jerry Dodge Aug 22 '13 at 19:13
  • 2
    Content-length is typically only universally supported when it comes to downloading files (program installers, general data files like DOC and PDF). It usually (thought always but just checked and SO does return content-lengths on text) is not supported when it comes to text data, so the functions involved will usually return -1. A good rule is to not expect content-length from the web server and write your code so it is not absolutely necessary. – Glenn1234 Aug 23 '13 at 02:22
  • 2
    For the record, yesterday I downloaded some software, and although the file size was about 10 MB, Google Chrome didn't show an expected time, or percent complete, because it didn't know how big the file was. The server didn't include this in the header. Still, Chrome respectfully downloaded it regardless of this field. – Jerry Dodge Aug 23 '13 at 13:01

2 Answers2

8

Not all HTTP HEAD responses contain content-length. So, what you are trying to do is impossible in general. If you encounter a response that does not contain the content length you need to download the contents in order to find the length.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • 1
    But some softwares like **Internet Download Manager** or **Internet Download Accelerator** can get the size of a webpage. Without downloading the whole page. How do they do this? – Sky Aug 22 '13 at 14:45
  • What is your evidence that they do actually do that? How would they manage to do that for a file that was generated dynamically? – David Heffernan Aug 22 '13 at 14:46
  • I added a URL to Internet Download Accelerator **without downloading it** (of course I clicked on _Download Later_). Then right-clicked on the URL and clicked on _Get File Size_ and it showed me the size without downloading it – Sky Aug 22 '13 at 14:54
  • 7
    @Sky: David is correct. Not all `HEAD` requests (or `GET` requests, for that matter) can provide a `Content-Length` header (the `TIdHTTP.Response.HasContentLength` property will tell you if the header was present). That is the official way to get the file size without downloading the actual file. If a download manager is able to get the size and you are not, then the manager is relying on other info. You would have to look at the actual `HEAD` response to see what other data is being reported. `http://stackoverflow.com/` does have a `Content-Length` header, though. – Remy Lebeau Aug 22 '13 at 18:32
  • @DavidHeffernan & @RemyLebeau Yeah. Got my answer. Thanks to David. I checked & `http://stackoverflow.com/` does have a `Content-Length`. But now I wonder how can I get `Content Length` from `stackoverflow.com`. It's a dynamic page right? And David said that it's impossible to get length from a dynamic web site. Because it generates the html code while the whole page is loading/downloadng. – Sky Aug 22 '13 at 19:33
  • 2
    If stackoverflow.com does supply content length for HEAD, then surely Indy will pass it on. – David Heffernan Aug 22 '13 at 19:48
  • 2
    If a `Content-Length` header is present, Indy will indeed provide it. StackOverflow (and other servers) can provide a `Content-Length` header if they know up front what the total file size actually is, even for dynamically created content. It is when content is being sent while it is being created, such as with a `Transfer-Encoding: chunked` header, where `Content-Length` is either not present because the size is unknown until the end of the request, or it is present but 0 (even though it is not supposed to be present, but some servers do that). – Remy Lebeau Aug 22 '13 at 19:52
  • 2
    So again, if a DM is getting the file size and you are not, then the DM has to be looking at something else, so you need to look at the full response to see what else you can look at. – Remy Lebeau Aug 22 '13 at 19:55
  • @Sky, check [this](http://www.php.net/manual/en/function.ob-get-length.php#13715) out. – OnTheFly Aug 23 '13 at 01:45
3

Even if a web server does return the proper content length, you're dividing it by 1048576 to get the megabyte value. Since http://stackoverflow.com/ is much less than a single megabyte, it is returning 0. I'm still stumped however where your -1 came from - because http://stackoverflow.com/ returns 194569 for me, without dividing. Did you get a -1 from another website? And are your results the divided value or the raw value from Http.Response.ContentLength?

Jerry Dodge
  • 26,858
  • 31
  • 155
  • 327
  • @jerryDodge yes. because the size is less than megabyte it returns 0. But my problem was something (I get `-1` for `http://www.google.com/`.) + With the code above, when I get `content-length` of StackOveflow it returns `196444` (without dividing it). But when I load the site with a browser and save the page source to a text file, the text size is `228154`. I don't know why. It's kinda confusing. – Sky Aug 23 '13 at 22:12
  • @Sky That's because elements get added to the page dynamically through script. Nothing confusing about that. Download the raw file *before* the script starts adding it returns 194569, but *after* the script has added new contents, it becomes roughly 228154 which varies depending on the data which was loaded dynamically. – Jerry Dodge Aug 23 '13 at 22:23
  • Also, StackOverflow constantly makes changes, so the size may have changed since your test and my test, etc. – Jerry Dodge Aug 23 '13 at 22:25