0

I'm trying to download the content of a webpage with this code, but it does not get the same as Firefox.

URL url = new URL("https://jumpseller.cl/support/webpayplus/");
InputStream is = url.openStream();
Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);

When I check /tmp/asdfasdf it is not the html source code of the page, but just bytes (no text). But still, in Firefox I can see the webpage and its source code

How can I get the real webpage?

user4052054
  • 395
  • 1
  • 6
  • 22
  • I work at Jumpseller.cl. Feel free to email us and we can provide you the full content of the file (considering you will provide adequate credit to us). – tiagomatos Jan 13 '16 at 11:26

2 Answers2

0

You need to examine the response headers. The page is compressed. The Content-Encoding header has a value of gzip.

Try this:

URL url = new URL("https://jumpseller.cl/support/webpayplus/");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();

if ("gzip".equals(conn.getContentEncoding())) {
    is = new GZIPInputStream(is);
}

Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);
VGR
  • 40,506
  • 4
  • 48
  • 63
0

Use HtmlUnit library and this code:

    try(final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
        java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setUseInsecureSSL(true);
        webClient.waitForBackgroundJavaScript(5 * 1000);         
        HtmlPage page = webClient.getPage("https://jumpseller.cl/support/webpayplus/");
        String stringToSave = page.asXml(); // It's a string with full HTML-code, if need you can save it to file.
        webClient.close();  
    }
Vika Marquez
  • 353
  • 1
  • 3
  • 12