how to get text between a specific span with HtmlUnit

Question

I'm new to HtmlUnit and I'm not even sure if it is the right tool for my project. I'm trying to parse a website and extract the values I need from it. I need to get the value "07:05" from this,

<span class="tim tim-dep">07:05</span>

I know that I can use the getTextContent() for extracting the value but I don't know how I can select a specific span. I used getElementById for finding the

<div>

tag that this expression belongs to but when I get the text content of that div, I get a whole line of text with a lot of unnecessary data. Can someone tell me how I can select this expression, possibly using the class name?

score 10 · Answer 1 · answered May 04 '13 at 21:44

You need to browse a page and interact with it, like this:

final WebClient web = new HtmlUnit();
final HtmlPage page = web.getPage("http://www.whateveryouwant.com.br");

Get the elements by the tagname, and iterate over it:

final List<DomElement> spans = page.getElementTagName("span");
for (DomElement element : spans) {
    if (element.getAttribute("class").equals("tim tim-dep")) {
        return element.getNodeValue();
    }
}

Or just use XPath:

// Not sure what getFirstByXPath return
DomElement element = page.getFirstByXPath("//span[@class='tim tim-dep']");
final String text = element.getNodeValue();

// You might want to get node value of child (text node) by following since node value of element is null. element.getChildNodes().get(0).getNodeValue(); or element.getTextContent(); — Bae Cheol Shin, Aug 10 '15 at 21:36

score 2 · Answer 2 · answered Jul 27 '14 at 18:59

2

here you go..

DomElement element = page.getFirstByXPath("//span[@class='tim tim-dep']");
String text = element.getTextContent();

answered Jul 27 '14 at 18:59

Mike.

61
4

Year after @brnfd 's answer and you're posting only part of it. – Trynkiewicz Mariusz Dec 04 '15 at 12:48

how to get text between a specific span with HtmlUnit

2 Answers2

Linked