3

I have a HTML page like

<html>
<head>
<!-- necessary java scripts -->
</head>
<body>
<div id="content"></div>
</body>

Using the script, when the page renders, appropriate html content is placed withing the div element with id "content". So after the page renders there are a whole lot of html content withing div element.

Now i need to extract the dynamically rendered content within the div element using Java. Can anyone please suggest a way to do it?

Sripaul
  • 2,227
  • 9
  • 36
  • 60

3 Answers3

1

The problem is that you need to evaluate script on the page in java. You need to get some web engine to do it. You can look here: Embedding Gecko/Webkit in Java And try to use webkit or gecko to load page. Then you can use some java library to parse html.

Community
  • 1
  • 1
Mikita Belahlazau
  • 15,326
  • 2
  • 38
  • 43
0

You can parse html with javax.swing.text.html.HTMLEditorKit.Parser.Have a look at this link

http://java.sun.com/products/jfc/tsc/articles/bookmarks/

UVM
  • 9,776
  • 6
  • 41
  • 66
0

Have a look through these:

http://java-source.net/open-source/html-parsers

Jack
  • 16,506
  • 19
  • 100
  • 167