0

I need to parse the HTML page with Java to retrieve some data.

For example, from incoming.html

<html>
 <head> 
  <title>TITLE</title> 
  <meta name="some name" content="some content" /> 
  <link type=".." title=".." rel=".." href="link" /> 
  <script type="text/javascript">..</script> 
 </head>
  <body>
      <!--googleoff:all-->
  <img src="image.jpg"/>
  <div class="div1"></div>
  <div class="Logo"><a href="/"><img src="logo.png"/></a></div>
  <div class="div2"></div>
    <ul>
      <li class=".."><a href="/”>a</a></li>
      <li class=".."><a href="/”>b</a></li>
    </ul>

  <div class="div1"></div>
  <div class="Logo"><a href="/"><img src="other.png"/></a></div>
  <div class=”div2”></div>

    <ul>
      <li class=".."><a href="/”>a</a></li>
      <li class=".."><a href="/”>b</a></li>
   </ul>
      <!--googleon:all-->
  </body>
 </html>

I need to receive outcoming.html

<html>
 <head> 
  <title>TITLE</title> 
  <meta name="some name" content="some content" /> 
  <link type=".." title=".." rel=".." href="link" /> 
  <script type="text/javascript">..</script> 
 </head>
 <body>
   <div class="Logo"><a href="/"><img src="other.png"/></a></div>
   <div class=”div2”></div>
 </body>
</html>

The purpose of the issue:

How to choose from 2 equals tags that have as difference only their contents.

In my case I have two tags:

<div class="Logo"><a href="/"><img src="logo.png"/></a></div>

and

<div class="Logo"><a href="/"><img src="other.png"/></a></div>

but I need only the tag where src="other.png"

What do you think the best way to do it?

Dan
  • 393
  • 1
  • 4
  • 19

1 Answers1

2

You can use the library JSoup.

Here is the link http://jsoup.org/

It is very simple to use. Here a simple example.

String html = "<div><p>Lorem ipsum.</p>";
Document doc = Jsoup.parseBodyFragment(html);
Element body = doc.body();
Davide Lorenzo MARINO
  • 26,420
  • 4
  • 39
  • 56