I currently use Jsoup to parse the DOM-tree of webpages. I have a separate application rendering the page and using JavaScript to extract the rendering position of every DOM-node. I use the JavaFX stage, webEngine, webView and executeScript functionality to execute the following JavaScript:
var all = document.getElementsByTagName("*");
var serialization = "";
var width = window.innerWidth;
var height = window.innerHeight;
for (var i = 0, max=all.length; i < max; i++) {
serialization += all.item(i).tagName+": "+all.item(i).offsetLeft+" "+all.item(i).offsetTop+" "+all.item(i).offsetWidth/width+" "+all.item(i).offsetHeight/height+"\n";
}
serialization
The problem I face now is to associate the output I get from the JavaScript with the information I collect from the Jsoup mechanics. Ie I want to add the rendering position of every node to the Jsoup data structure. Is there some unique ID for each DOM-node that I dont know about, or should I try a completely different approach?