1

Hi want to extract String between HTML Tags from a source code but I am getting an error by using the code given below. Could someone help me with the reason for error?

Pattern pattern = Pattern.compile("/\<body[^>]*\>([^]*)\<\/body/");
Matcher matcher = pattern.matcher(s1);
while (matcher.find()) {
  System.out.println( "Found value: " + matcher.group(1).trim() );
}

The error I am getting is: "Invalid escape sequence"

Thanks

Arnav
  • 51
  • 10
  • 1
    Don't parse HTML using regex. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Gereon Apr 19 '15 at 08:54

1 Answers1

2

Don't parse html files with regex. I suggest you to use jsoup parser.

String html = "<html><body><h1> Hello, World! </h1></body></html>";
Document doc = Jsoup.parse(html);
String text = doc.body().text();
System.out.println(text);

Output:

Hello, World!
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274