-3

How can i check to see if a JPG url exists before downloading it, to avoid an exception?

procedure TForm1.Button1Click(Sender: TObject);
var
  FS: TFileStream;
  Url, FileName: String;
  I, C: Integer;
begin
  for I := 1 to 1000 do
  begin
    Url := 'http://www.mysite.com/images/' + IntToSTr(I) + '/Image.jpg';
    FileName := 'C:\Images\' + IntToStr(I) + '.jpg';
    FS := TFileStream.Create(FileName, fmCreate);
    try
      try
        IdHTTP1.Get(Url);
        c := IdHTTP1.ResponseCode;
        if C = 200  then
          IdHTTP1.Get(Url, FS);
      except
      end;
      Application.ProcessMessages;
    finally
      Fs.Free;
    end;
  end;
end;
TLama
  • 75,147
  • 17
  • 214
  • 392
JakeSays
  • 2,048
  • 8
  • 29
  • 43
  • 4
    You must handle exceptions. Note, that except error 404, also different errors may occur and since Indy is driven by exceptions, it is a must to handle them. Something similar has been asked in [this question](http://stackoverflow.com/q/13950676/960757). – TLama Mar 01 '13 at 23:29
  • If you aren't the webmaster, I'm afraid you have no other way to make a request (a HEAD one if you don't want to try the GET) and check the status reported by the web server. If it is 404 the _jpg url does not exist_. – jachguate Mar 01 '13 at 23:31
  • 1
    On the other hand, if you want to avoid the exception, include the 404 status in the `AIgnoreReplies` parameter. – jachguate Mar 01 '13 at 23:32
  • 1
    @jachguate, ignoring status 404 will keep the code unsafe. Think about the other exceptions that may occur. – TLama Mar 01 '13 at 23:37
  • @TLama what you mean by _unsafe_? IMHO the _eat any exception_ —now present— exception handler is even worst!. If a exception occurs, you must handle only what you know how to handle and let all the other exceptions fly – jachguate Mar 01 '13 at 23:39
  • can you explain how to use AIgnoreReplies in my code? – JakeSays Mar 01 '13 at 23:39
  • 1
    Johnny, `IdHTTP1.Get(Url, FS, [404]);`, but good luck if you meet a different error. @jachguate, that's why I wrote the post about exception handling for `TIdHTTP`. – TLama Mar 01 '13 at 23:40
  • `IdHTTP1.Get(Url, FS, [404]);` will not cause an exception in case the status reported by the web server is 404. – jachguate Mar 01 '13 at 23:41
  • 1
    Eww, `Application.ProcessMessages` :( – Jerry Dodge Mar 02 '13 at 00:48
  • I downvoted because it was a poorly asked question. A little more explanation would have avoided that. As for the code in the question, I cannot hold that towards my vote yet it is however poorly written. I do not look down at poorly written code all the time because face it, we were all there at one time trying to figure out heads from tails. Don't look down on anyone who is still learning. However, you do have to ask more detailed questions than just a sentence and some code. – Jerry Dodge Mar 02 '13 at 01:02

2 Answers2

8

To answer your main question, the only way you have to check if a particular URL is valid or not is to check against the web server and check what the server tells you.

With indy you can use the AIgnoreReplies parameter of the Get and other methods to instruct the TIdHTTP instance not to raise an exception in case that status is returned by the web server, like this:

IdHTTP1.Get(Url, FS, [404]);

A exception will still be raised in case any status different than 200 and 400. There are other status codes that may don't raise an exception depending on various configurations of the component, for example status code 401 and authentication parameters, and others.

That said, I find several problems in your code:

  • The try/except block you have kills any exception, any exception regardless of what nature the exception may be. It treats the same a EOutOfMemory than a EIdSocketError, EIdHTTPProtocolException or even a EMayanWorldEnd exception!
  • You download the image twice... it happens you just ignore the first downloaded data and use it to try to determine if the resource exists or not. If you feel you must have to check if the resource exists or not, don't perform a GET command over it, perform a HEAD one!
  • Don't use Application.ProcessMessages, move your code to a Thread!
  • Learn to handle in a proper way the different status codes you may get and other errors you may find. It is hard at the beginning, but is the way to go if you want to make it robust. Different errors may be:
    • HTTP status codes, like:
      • Request TimeOut (slow down and retry)
      • HTTP Version Not Supported (well, try with another version)
      • Etc.
    • Network Failures
      • Is the internet down
      • Is the WebServer down
      • Etc.
    • As a general rule, let fly any other exception you don't know how to handle... or if you have no choice, eat them but log what's happening and read the logs, that way you will improve your knowledge and skills.
jachguate
  • 16,976
  • 3
  • 57
  • 98
  • 2
    And a final note, don't depend on people at StackOverflow, or any website, to teach you everything you need to know. Take the time to figure out how things work by yourself. No offense, but this world has become lazy with the internet and always expect others to give them fully coded and complete answers. Problem is, people aren't paid to answer questions on free websites, so people don't usually want to do your work for you. I'm sure if I was getting paid to answer questions, I would do someone's whole project for them. – Jerry Dodge Mar 02 '13 at 00:56
2

If you first download all of the internet then you can check in your Exabyte data collection whether the image exists.

Otherwise, you will have to deal with the case that the file does not exist.

You will also have to deal with various other errors, such as timeouts, or your web scraper hitting the download limit and being blocked.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194