-1

I am writing a console command to print what is inside ANY HTML tag that the user desires. The command goes like this:

ALL <tag>

Where <tag> is for example <b> and the programs job is to look at an HTML code and write ONLY what is inside of that tag.

Example of a HTML code:

<html>
   <head>
      <title>Sample "Hello, World" Application</title>
   </head>
   <body bgcolor=white>
      <table border="0" cellpadding="10">
         <tr>
            <td>
               <img src="images/springsource.png">
            </td>
            <td>
               <h1>Sample "Hello, World" kraljivan488@gmail.com</h1>
            </td>
         </tr>
      </table>
      <p>This is the home page for the HelloWorld blaz13@gmail.com </p>
      <p>To prove that they work, you can execute either of the 192.168.1.1</p>
      <ul>
         <li>To a <a href="hello.jsp">JSP page</a>.</li>
         <li>To a <a href="hello">servlet</a>.</li>
      </ul>
   </body>
</html>

And it has to be regex, because there is also a console command to write what regex this command uses.

I use "<tag.*<\\/tag>". It also prints the tags.

What regex do I need to use?

HDJEMAI
  • 9,436
  • 46
  • 67
  • 93
Ivan Kralj
  • 19
  • 4

1 Answers1

0

Try this

String regex = "<" + tag + "([^>]+?)"

So what it does it matches againt the start of the tag and when it encouters the first >, stop and remembers everything inside the tag which is the result you want, you just need to process the match to extract the result.

Matcher matcher = Pattern.compile(regex).matcher(yourString);
String result = matcher.group(1); // our capturing in the regex
Trash Can
  • 6,608
  • 5
  • 24
  • 38