2

I'm trying to retrieve wysiwyg html content from a web page (generated with apache wicket, but I don't think it cares). I tried the solutions described here but I always get an HTML body like the one that follows:

<body>
    <div
    style="width: 830px; height: 300px; margin: auto; margin-top: 50px;">
        <div wicket:id="rangeBar"
        style="float: left; width: 400px; height: 300px; margin-right: 30px;"
        id="rangeBar1"></div>
    </div>
</body>

I was expecting to retrieve data similar to the one I see in the browser web console like:

<body>
    <div style="width: 830px; height: 300px; margin: auto; margin-top: 50px;">
        <div wicket:id="rangeBar" style="float: left; width: 400px; height: 300px; margin-right: 30px;" id="rangeBar1" class="shield-chart">
            <div id="shielddw" class="shield-container" style="position: relative; overflow: hidden; width: 400px; height: 300px; line-height: normal; z-index: 0; font-family: &amp; amp; #39; Segoe UI&amp;amp; #39; , Tahoma , Verdana, sans-serif; font-size: 12px;">
                <svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="400" height="300">
                    <defs>
                    <clippath id="shielddx">
                    <rect rx="0" ry="0" fill="none" x="0" y="0" width="9999" height="300" stroke-width="0.000001"></rect></clippath>
                    <clippath id="shielddy">
                    <rect fill="none" x="0" y="0" width="331" height="210"></rect></clippath>
                    <filter id="a5a87bf2-0ea3-4664-8ceb-bd50b883a117" height="120%">
                    <fegaussianblur in="SourceAlpha" stdDeviation="3"></fegaussianblur>
                    <fecomponenttransfer>
                    <fefunca type="linear" slope="0.2"></fefunca></fecomponenttransfer>
                    <femerge>
                    <femergenode></femergenode>
                    <femergenode in="SourceGraphic"></femergenode></femerge></filter></defs>
                    <rect rx="0" ry="0" fill="#2D2D2D" x="0" y="0" width="400"
                    height="300" stroke-width="0.000001"></rect>  
                      ..... 
                 </svg>
            </div>
            <div class="shield-tooltip" style="pointer-events: none"></div>
        </div>
    </div>
</body>

Is there any way for getting such content in java?

Thanks, Laura

UPDATE: Here is my java code

HttpClientBuilder builder = HttpClientBuilder.create();
CloseableHttpClient httpclient = builder.build();
HttpGet httpget = new HttpGet(TEST_WEB_PAGE);
HttpResponse response = httpclient.execute(httpget);
InputStream content = response.getEntity().getContent();
OutputStream htmlStream = null;
File htmlFile = new File(ROOT + "etc/html/demo_apache_" + new Date() + ".html");
try {
    htmlStream = new FileOutputStream(htmlFile);
    byte[] buffer = new byte[8 * 1024];
    int bytesRead;
    while ((bytesRead = content.read(buffer)) != -1) {
        htmlStream.write(buffer, 0, bytesRead);
    }
} finally {
    if (htmlStream != null)
        htmlStream.close();
}
Community
  • 1
  • 1
Laura
  • 181
  • 1
  • 16

1 Answers1

3

Is there any JavaScript included in the head tag that might be populating the div after the page has loaded?

If you obtain the page programmatically with Java, this JavaScript will not be executed.

Aidy J
  • 305
  • 1
  • 5
  • I'm not familiar with your project's set up, but perhaps you could render the page inside a web control programmatically and then pull the source back out of it – Aidy J May 20 '16 at 13:28
  • How can I do that? – Laura May 20 '16 at 13:28
  • It depends whether you're using Android/Swing/ something else. See if you can find a relevant control and take a look at the documentation – Aidy J May 20 '16 at 13:30
  • I'm writing a java back-end application. I have to save the content of a web page in pdf. – Laura May 20 '16 at 13:31
  • Can you not include a front-end library in the project? You wouldn't need to actually display it. – Aidy J May 20 '16 at 13:32
  • I think your best option is something like http://stackoverflow.com/a/33693350/497381 – martin-g May 20 '16 at 13:48