With regular expressions, you can use the following:
String s = "<div id=\"tt\" class=\"info\">\n Text Here \n</div>";
System.out.println(s);
Pattern p = Pattern.compile("<div id=\"tt\" class=\"info\">\\s*([^<]+?)\\s*</div>", Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); // Text Here
}
However, a better solution would be to parse the HTML into XHTML, using JTidy, for example, and then extract the required text using XPath (//div[@id = 'tt']/text()
). Something along these lines:
public static void main(String[] args) throws Exception {
// Create a new JTidy instance and set options
Tidy tidy = new Tidy();
tidy.setXHTML(true);
// Parse an HTML page into a DOM document
URL url = new URL("http://something.com/something.html");
Document doc = tidy.parseDOM(url.openStream(), System.out);
// Use XPath to obtain whatever you want from the (X)HTML
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//div[@id = 'tt']/text()");
String text = (String)expr.evaluate(doc, XPathConstants.STRING);
System.out.println(text); // Text Here
}