1

I have a free domain name with URL Forwarding (Cloaking) to another site.

If I type http://my.com/1.zip in browser's web-address then it goes to http://his.com/1.zip and downloads a file.

How can I do the same with Indy TIdHTTP (Delphi XE2). Browsers and I get 404-error at first but then they somehow download a file except me.

I need to use the first link but actually download from another site. E.g. the first site has a xxx.zip file. I want to go to http://my.com/xxx.zip but actually to download from http://his.com/xxx.zip (where the file stores).

Thanks!

Edited:

I set HandleRedirects to true, assigned a CookieManager (I've already seen this question Indy - IdHttp how to handle page redirects?).

Try to download this http://liga-updates.ua.tc/GDI+.zip in your Delphi

Community
  • 1
  • 1
maxfax
  • 4,281
  • 12
  • 74
  • 120
  • Similar question: [Indy - IdHttp how to handle page redirects?](http://stackoverflow.com/questions/4549809/indy-idhttp-how-to-handle-page-redirects) – mjn Jan 15 '12 at 07:03
  • Have you tried a tool (Fiddler) to see the actual HTTP headers for the initial request? – mjn Jan 15 '12 at 07:05
  • no headers (in HTTP Analyzer). Please see my edited question – maxfax Jan 15 '12 at 18:37
  • Did you ever look at the page source? – OnTheFly Jan 15 '12 at 22:57
  • 1
    Page source is the key. URL cloaking is usually accomplished by using HTML frames. The browser address bar displays the domain of the top-level frameset and then an inner frame displays content from another domain. So you will have to download the content of the cloaking URL and parse it to determine the true URL of the content that is being cloaked. – Remy Lebeau Jan 16 '12 at 06:12
  • does it has to be with `TIdHttp` only? – kobik Jan 16 '12 at 21:36
  • @kobik: I do not understand what you are asking. – Remy Lebeau Jan 18 '12 at 00:51
  • @Remy, in my example I used `THttpCli`. – kobik Jan 18 '12 at 08:24
  • It does not matter what you use. It could be `TIdHTTP`, `THttpCli`, `WinInet`, `WinHTTP`, `libcurl`, it does not matter. The fact remains that the cloaking webserver is returning an HTML page that internally loads the true URL inside of an HTML frame. So you have to use whatever client you want to download that HTML, then parse out the real URL that is being loaded in a frame. – Remy Lebeau Jan 18 '12 at 21:53

3 Answers3

3

The website in question is returning an HTTP 404 response with an HTML page containing an <iframe> that loads the real URL. A 404 reply will cause TIdHTTP to raise an EIdHTTPProtocolException exception by default. The content of the reply (the HTML) can be accessed via the EIdHTTPProtocolException.ErrorMessage property.

For example:

procedure TForm1.Button1Click(Sender: TObject); 
var 
  Http: TIdHttp; 
  URL, Filename: string; 
  FS: TFileStream; 
  ...
begin 
  Filename := 'C:\path\GDI+.zip';
  URL := 'http://liga-updates.ua.tc/GDI+.zip'; 

  FS := TFileStream.Create(Filename, fmCreate); 
  try
    try
      Http := TIdHttp.Create(nil); 
      try 
        try
          Http.Get(URL, FS); 
        except 
          on E: EIdHTTPProtocolException do begin
            if E.ErrorCode <> 404 then raise;
            URL := ParseIFrameURLFromHTML(E.ErrorMessage);
            if URL = '' then raise;
            Http.Get(URL, FS); 
          end;
        end; 
      finally 
        Http.Free; 
      end; 
    finally 
      FS.Free; 
    end; 
  except
    DeleteFile(Filename);
    ShowMessage('Unable to download file.');
    Exit;
  end;
  ShowMessage('Downloaded OK'); 
end;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
1

It seems that http://liga-updates.ua.tc is based on 404 error redirects to custom pages (used internally by the web-server).

Try to do http head on any resource there: it will return 404 with HTML response. that response holds an iframe element with src to the real download file. based on this, I wrote a small code.

I used THttpCli because it seems TIdHttp will not return a "valid" Response with status 404 (not in my D5 version anyway).

uses HttpProt;

procedure TForm1.Button1Click(Sender: TObject);
const
  IFRAME_SRC = '<iframe src="';
var
  HttpCli: THttpCli;
  S, URL, FileName: string;
  I: Integer;
  FS: TFileStream;
begin
  URL := 'http://liga-updates.ua.tc/GDI+.zip';

  HttpCli := THttpCli.Create(nil);
  try
    HttpCli.URL := URL;
    HttpCli.MultiThreaded := True;
    try
      HttpCli.Get;
    except
      // this will always be 404 for this domain (test from outside the IDE)
    end;
    S := HttpCli.LastResponse; // THttpCli returns valid response when status 404
    // extract IFRAME src
    I := Pos(IFRAME_SRC, S);
    if I <> 0 then
    begin
      Delete(S, 1, I + Length(IFRAME_SRC) - 1);
      URL := Copy(S, 1, Pos('"', S) - 1);
      HttpCli.URL := URL;
      FileName := ExtractFileName(StringReplace(URL, '/', '\', [rfReplaceAll]));
      FS := TFileStream.Create(FileName, fmCreate);
      try
        HttpCli.RcvdStream := FS;
        try
          HttpCli.Get;
          ShowMessage('Downaloded OK');
        except
          ShowMessage('Unable to download file.');
        end;
      finally
        FS.Free;
      end;
    end
    else
      ShowMessage('Unable to extract download information.');
  finally
    HttpCli.Free;
  end;
end;
kobik
  • 21,001
  • 4
  • 61
  • 121
  • An HTTP `404` response does not perform a redirect. Redirects has performed by HTTP `3xx` replies (other than `304`) with the HTTP `Location` header. – Remy Lebeau Jan 18 '12 at 00:48
  • 1
    404 error redirects is not a response redirect. it is used internally by the web server. here is an [example](http://www.htaccessbasics.com/404-custom-error-page/). – kobik Jan 18 '12 at 08:16
  • That is still not a redirect. In this situation, the webserver is returning a standard `404` reply with an HTML page as its content, and that HTML contains a frame that loads another URL internally. Most webservers do not handle custom `404` pages that way. They load the account owner's custom `404` page, parse it as needed (to process server-side scripts), and serve the result as the content of the `404` reply directly. – Remy Lebeau Jan 18 '12 at 21:50
  • 1
    the term is called 404 redirect (google it). again, it is NOT a header 3xx redirect that is returned to the client. but this is just terminology. "the webserver is returning a standard 404 reply with an HTML page as its content" - this is true and exactly what my code does. try to read this content with `TIdHttp`... I couldn't. hens THttpCli. – kobik Jan 18 '12 at 22:01
  • 2
    A `404` reply will cause `TIdHTTP` to raise an `EIdHTTPProtocolException` exception by default. The content of the reply can be accessed via the `EIdHTTPProtocolException.ErrorMessage` property. – Remy Lebeau Jan 18 '12 at 22:47
  • You are 100% correct. I was testing E.Message which returns 'HTTP/1.1 404 Not Found'. ErrorMessage returns HTML content. thanks. post it as an answer. – kobik Jan 18 '12 at 23:09
0

Try the HandleRedirects property of the TIdHTTP component.

Pateman
  • 2,727
  • 3
  • 28
  • 43