0

I am trying to read a specific webpage which loads in Firefox without any problems, but in Delphi I get a 404 error. How can I fix it? Maybe they don´t want bots to scrape their page...

IdHttp1 := TIdHTTP.Create(nil);
IdSSLIOHandlerSocketOpenSSL1 := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
IdSSLIOHandlerSocketOpenSSL1.SSLOptions.SSLVersions := [sslvTLSv1, sslvTLSv1_1, sslvTLSv1_2];
IdHTTP1.IOHandler := IdSSLIOHandlerSocketOpenSSL1;
        
IdHttp1.ReadTimeout := 20000; 
IdHttp1.ConnectTimeout := 3000; 
IdHttp1.HandleRedirects := TRUE;
IdHttp1.Request.UserAgent := 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36';
IdHTTP1.Request.Accept := 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9';
IdHTTP1.Request.AcceptEncoding := 'gzip, deflate, br';
IdHTTP1.Request.AcceptLanguage := 'de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7';
IdHTTP1.AllowCookies := true;
    
try
  HTMLText := IdHttp1.Get('https://ecagbda.r.bh.d.sendibt3.com/tr/cl/DVtXcTF4cdz4-WFm1tHD49xCh694dE4r6-n2sTyV9cXzQS_1WHzxCA3EqrKluw_X1fzriADR2oOmRFuWxqGWCfz6S_Jbbh1viVcKrbrhR6yZAkDPLl-GPE7jXp9UymHh5J2qhya1XcfAXh0l4cTIb7UXI5dFp6boutjlrL38JyiTxMGsEHQK8uVRkFtMstmMYhPrkUI8cBkiHxj3mdjVu6SXFEw6644iLwjCFZoGSuu6M95bc0fAnbLy0mDAHk2qt2ASx2u4QuKRoDIZvlTGSjPhJnUzP5n4VjPyxgu3MimDuoj2ezWmxRKIYft1PK4oP2fEx2SSJyX1-PKgAZCvfCs41ZsjgXY_Ng');
except
  on E: Exception DO
  begin
    //   ShowMessage('Exception class name = ' + E.ClassName);
    //   ShowMessage('Exception message = ' + E.Message);
    //Halt;
    Fehler := True;
  end;
end;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Other than the assignment of the `AcceptEncoding` property (which doesn't belong and should be removed since you don't have a `Compressor` assigned), the rest of the code looks fine. Have you tried comparing `TIdHTTP`'s HTTP traffic against FireFox's HTTP traffic to look for any differences? Since you have `HandleRedirects=True`, have you checked if the server is sending an HTTP redirect to a non-existing URL? Have you tried using Firefox's `User-Agent` instead of Safari's? There are several possible reasons for this error, but no usable debug details for us to go on. – Remy Lebeau Dec 20 '21 at 19:37
  • Yes, I tried different User-Agents ... - and no: there is no redirect to a non existing page, otherwise the browser won´t work either. In the browser a page is loaded with a javascript refresh - it would easy to parse the URL but in Delphi I always get a 404 error. I think it is a protocol problem - Delphi can´t handle TLS 1.3. – Tobias Honscha Dec 21 '21 at 16:30
  • "*there is no redirect to a non existing page, otherwise the browser won´t work either*" - not necessarily. Lots of web servers customize responses for specific types of clients. It is entirely possible that this server could send a 404 to `TIdHTTP`, but not send a 404 to a real browser. That is why I suggested playing around with the `UserAgent`. – Remy Lebeau Dec 21 '21 at 16:47
  • "*I think it is a protocol problem - Delphi can´t handle TLS 1.3*" - does this web server actually *require* TLS 1.3? Does a real browser actually *use* TLS 1.3 when a 404 is not returned? If you turn off TLS 1.3 in your browser, does the site still work? And FYI, TLS support has nothing to do with **Delphi** itself. **Indy** doesn't officially support OpenSSL 1.1.x yet, which is needed for TLS 1.3, but that is a [work-in-progress](https://github.com/IndySockets/Indy/pull/299) that you can download today and try for yourself. – Remy Lebeau Dec 21 '21 at 16:50
  • Ok - thanks for the help. We can close this case because there was another error in my code - the url is longer than 255 chars ;) That was the reason for the 404 – Tobias Honscha Dec 21 '21 at 17:32
  • "*the url is longer than 255 chars*" - Neither Indy nor the HTTP protocol put a length limit of URLs. Web browsers do, but the limit is [much higher than 255](https://stackoverflow.com/questions/417142/). Does the web server impose a limit on URL length? Per the HTTP standard, if a server receives a URL longer than it can handle, it is supposed to return a 414, not a 404. – Remy Lebeau Dec 21 '21 at 18:46
  • 1
    I stored the URL in a mysql database - it was truncated because of the field definitions. – Tobias Honscha Dec 23 '21 at 08:30

0 Answers0