1

So I found that if you get the page source with WebDriver, you actually get the generated source of the entire DOM (not just the HTML source code of the page that loaded). You can then use this String to generate a Jsoup Document. This is cool, because Jsoup is much faster than WebDriver at searching for elements, it also has a much better API to do so.

So, is there anyway to turn a Jsoup Element into a WebDriver WebElement? I saw another post on stackoverflow about using a method to generate an xpath from the Jsoup document, but that's not what I'm looking for since WebDriver will still have to parse the page and use the Xpath to lookup the element, defeating the purpose (unless your porpuse is purely to use Jsoup for its superior Selector methods).

The reason I want to try and use Jsoup to find WebElements for WebDriver is because on some websites, WebDriver is very very slow (I work for a company that automation hundreds of 3rd party websites, we have no control over these sites).

Andrio
  • 1,852
  • 2
  • 25
  • 54

2 Answers2

2

There seems to be a confusion between interactive and non-interactive tools here.

WebDriver tests are very often slow (in my experience) due to unnecessary and defensive waits and delays, using improperly-understood frameworks, and often written by junior or outsourced developers - but fundamentally also because WebDriver is mimicking a real user's actions in 'real time' on a real browser, and communicating with the browser app using an API (based on a specification) and a protocol. It's interactive.

(Less so with HtmlUnit, PhantomJS etc.)

By contrast, Jsoup is just a glorified HTTP client with extra parsing capabilities. It's non-interactive, and ultimately works off a snapshot String of data. We'd expect it to be much faster for its particular use-cases.

Clearly both are HTTP clients of a sort, and can share static web content, which is why WebDriver could pass data off for processing by Jsoup (though I've never heard of this use-case before).

However, Jsoup can never turn one of its Elements (a Java snapshot object containing some properties) into a WebDriver WebElement, which is more a kind of 'live' proxy to a real and interactive object within a program like Firefox or Chrome. (Again, less so with HtmlUnit, PhantomJS etc.)


So you need to decide whether interactivity is important to you. If it's crucial to mimic a real user, WebDriver has to 'drive' the process using a real browser.

If it's not, then you can consider the headless browsers like HtmlUnit and (especially) PhantomJS, as they will be able to execute JavaScript and update the DOM in a way that the HTTP libraries and Jsoup can't. You can then pass the output to Jsoup etc.

Potentially, if you went down the PhantomJS route, you could do all your parsing there using the JavaScript API. See: Use PhantomJS to extract html and text etc.

For a lot of people, interactivity isn't important at all, and it's quicker to drop WebDriver completely and rely on the libraries.

Community
  • 1
  • 1
Andrew Regan
  • 5,087
  • 6
  • 37
  • 73
  • Thank you for the explanation, but I should clarify what I mean by WebDriver being very slow. I understand that it attempts to mimic a real-world user, but on some websites (and again, I've had to work on hundreds of sites I don't control), it is extremely slow; the page will be completely loaded, the dom is ready, and yet a simple click action can take WebDriver literally a couple minutes to perform. There must be something with the site that conflicts with WebDriver. With that said, it looks like Jsoup is not the answer. – Andrio Mar 22 '16 at 13:17
  • Oh I totally agree that UI tests are slower, and I'd be the first to recommend a pure HTTP alternative, that said, 'minutes' is pretty extreme. I'd be interested to help diagnose if you were able to post something specific? – Andrew Regan Mar 22 '16 at 13:42
0

I know this question is incredibly old, but just so anyone who comes to see this can find this answer. This will return an xpath from your Jsoup Element. This was translated to Java by me, but the original source I copied the code from was https://stackoverflow.com/a/48376038/13274510.

You can then use the xpath with WebDriver

Edit: Code works now

public static String jsoupToXpath(Element element) {
    String xpath = "/";
    List<String> components = new ArrayList<>();

    Element child = element.tagName().isEmpty() ? element.parent() : element;
    System.out.println(child.tag());
    while (child.parent() != null){
        Element parent = child.parent();
        Elements siblings = parent.children();
        String componentToAdd = null;

        if (siblings.size() == 1) {
            componentToAdd = child.tagName();
        } else {
            int x = 1;
            for(Element sibling: siblings){
                if (child.tagName().equals(sibling.tagName())){
                    if (child == sibling){
                        break;
                    } else {
                        x++;
                    }
                }
            }
            componentToAdd = String.format("%s[%d]", child.tagName(), x);
        }
        components.add(componentToAdd);
        child = parent;
    }

    List<String> reversedComponents = new ArrayList<>();
    for (int i = components.size()-1; i > 0; i--){
        reversedComponents.add(components.get(i));
    }
    xpath = xpath + String.join("/", reversedComponents);

    return xpath;
}
Greeley
  • 1
  • 1