0

I am using Jsoup to scrape a page for data but the data is not in a specific tag.

<strong>LABEL IS HERE</strong> DATA IS HERE

Using an XPath I am able to get a path //*[@id="center-text"]/text()[1] but unfortunately chrome does not allow me to copy the CSS Path.

I can get a CSS Path for the <strong> LABEL IS HERE</strong> but not for the other text. Is there a way to get this data using CSS Selector language?


Sample data

<div id="center-text"> 
      <strong>ifno</strong> data&nbsp;&nbsp;&nbsp;
      <strong>ifno</strong> data&nbsp;&nbsp;&nbsp;
      <strong>Tifno</strong> data
      <br> 
      <strong>ifno</strong> data&nbsp;&nbsp;&nbsp;
      <strong>ifno</strong> data&nbsp;&nbsp;&nbsp;
      <strong>ifno</strong> data 
</div>
washcloth
  • 2,730
  • 18
  • 29

1 Answers1

2

In JSOUP you can use nextSibling method:

public Node nextSibling()

Get this node's next sibling.

Returns: next sibling, or null if this is the last sibling

You should get out by:

Elements elements = doc.select("div[id=\"center-text\"] strong");

for(Element element : elements) {
    System.out.println("nextSibling: " + element.nextSibling());
}

The result will be:

nextSibling:  data&nbsp;&nbsp;&nbsp;
nextSibling:  data&nbsp;&nbsp;&nbsp;
nextSibling:  data
nextSibling:  data&nbsp;&nbsp;&nbsp;
nextSibling:  data&nbsp;&nbsp;&nbsp;
nextSibling:  data 
antoniodvr
  • 1,259
  • 1
  • 14
  • 15
  • This answer is correct but i did have to change your XPath to css. It should be `#center-text`. But otherwise it works. Thank you very much. – washcloth Sep 19 '15 at 21:40
  • The general problem underlying this is that you can't select a Text-Node with CSS selectors: http://stackoverflow.com/questions/5688712/is-there-a-css-selector-for-text-nodes – luksch Sep 20 '15 at 09:07