0

I have a String that contains html text

<html ...
      ...

    <tr class="test1" onmouseover= .....................>
       <td ..........> <strong>Test Text</strong>  </td>
       <td ............">Test Text 2</td>

       <span class="x1" title="Test Title 1">X1</span>
       <span class="x2" title="Test Title 2">X2</span>
       <span class="x3" title="Test Title 3">X3</span>
    </tr>  
..
.....

I need to create a String trString that contains only text and titles that are within the <tr class="test1"

So that trString = "Test Text Test Text2 Test Title 1 Test Title 2 Test Title 3"

How can I do this ?

I tried using html parser however it appears that it removes titles

Buras
  • 3,069
  • 28
  • 79
  • 126

1 Answers1

1

Use jsoup to parse the HTML to a DOM and then use the CSS selector *[title] to get a list of all elements that have a title attribute.

jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

  • scrape and parse HTML from a URL, file, or string
  • find and extract data, using DOM traversal or CSS selectors
Community
  • 1
  • 1
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245