I am trying to scrape data using java from a site that uses Cloudflare Enterprise Package protection. I haven't been able to find very much information about this DDOS protection system on the Web but here's what I believe is happening (from inspecting the HTTP responses and javascript)
- Client sends GET request to the server.
- Server determines that a specific cookie is absent from the GET request and returns an HTTP 503 response along with some HTML.
- The client's browser automatically runs the javascript on that response, solves a math problem and sends a new GET request with the solution to that problem appended as a query string.
- Server responds with an HTTP 302 redirect response and the necessary cookie.
- Browser sends GET request with the correct cookie and the server gives an HTTP 200 response and all is well.
My question has to do with getting the initial response stream in java. I create the connection, add the user agent, and try to open the stream. As expected, I receive the 503 response. However, java considers this an exception and will not let me access the HTML that I believe should be appended to this response. Does anyone know how to get the HTML? Or perhaps it is impossible to append HTML to a 503 and I just don't understand correctly what is going on?
Thanks!