2

In my project, which parses the HTML page, then uses the DOM tree for different operations, just like, comparing templates of two URLS.

For that, I am using JSOUP.

But it does not able to load Dynamic contents in DOM tree.

Can you tell me how can I load dynamic content using JSOUP in Java, or can you tell me any other method for doing the same?

EDIT NO. 1

As given link shows, it works using PhantomJS and Zombie.js in Java. Can you tell me how can I do this ?

Edit No. 2

I first try to get dynamic page by using Selenium, and the code is as follows,

public static void main(String[] args) throws IOException {

 // Selenium
 WebDriver driver = new FirefoxDriver();
 driver.get("ANY URL HERE");  
 String html_content = driver.getPageSource();
 driver.get("ANOTHER URL HERE");
 String html_content1 = driver.getPageSource();
 driver.close();

 // Jsoup makes DOM here by parsing HTML content
 Document doc1 = Jsoup.parse(html_content);
 Document doc2 = Jsoup.parse(html_content1);

 // OPERATIONS USING DOM TREE
}

But this takes lots of time after optimizing also. Now as per your instructions, I moved to HtmlUnit. But I am not able to make code, that gets Dynamic Page source code into String , and then I use this String for further paring using Jsoup, help me to write that code using HtmlUnit.

Code using HtmlUnit :-

package XXX.YYY.ZZZ.Template_Matching;

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import org.junit.Assert;
import org.junit.Test;

/**
 *
 * @author jhamb
 */
public class HtmlUnit {

    @Test
    public void homePage() throws Exception {
        final WebClient webClient = new WebClient();
        final HtmlPage page = webClient.getPage("http://www.jabong.com/Yepme-3-4Th-Sleeve-Printed-Blue-Top-Mksp-191481.html");

        Document ht = page.getOwnerDocument();
        System.out.println(ht);

        webClient.closeAllWindows();
    }

    public static void main(String[] args) throws Exception {
        HtmlUnit htmlUnit = new  HtmlUnit();
        htmlUnit.homePage();
    }
}
Community
  • 1
  • 1
devsda
  • 4,112
  • 9
  • 50
  • 87

1 Answers1

0

I'm afraid, JSoup won't work in this case.

Try using HtmlUnit.

devnull
  • 118,548
  • 33
  • 236
  • 227
  • Is HtmlUnit equally eficient as Jsoup is, for parsing and makind DOM tree? Is HTMLUnit contains all methods as Jsoup contains? Can we do the same work with Jsoup + Selenium ? – devsda Apr 04 '13 at 08:43
  • See above Edit No. 1 for my given task. – devsda Apr 04 '13 at 08:51
  • Can you tell me the implementation guide, please. I don't know how to start with HtmlUnit, how this helpful in getting dynamic contents. – devsda Apr 04 '13 at 11:33
  • Link for the getting started guide: http://htmlunit.sourceforge.net/gettingStarted.html – devnull Apr 04 '13 at 11:37
  • Hey used this using Selenium,but it takes 95 % of the time of whole processing time. Is HtmlUnit faster than Selenium? See my question http://stackoverflow.com/questions/15830334/selenium-tooks-lots-of-the-time-to-get-dynamic-page-of-given-url – devsda Apr 05 '13 at 09:32
  • I'd guess that HTMLUnit would be faster. – devnull Apr 05 '13 at 09:41
  • Please see Edit No. 2. Please help me in getting dynamic page source in a string for further parsing using Jsoup. Help me in that. Please. – devsda Apr 06 '13 at 09:39