I am trying to extract page title from HTML and XML pages. This is the regular expression I use:
Pattern p = Pattern.compile(".*<head>.*<title>(.*)</title>.*</head>.*");
The problem is that it only extracts the title from HTML files and gives me null for XML files. Can any one help me in changing the regex to the get the XML page titles as well?
Code:
content= stringBuilder.toString(); // put content of the file as a string
Pattern p = Pattern.compile(".*<head>.*<title>(.*)</title>.*</head>.*");
Matcher m = p.matcher(content);
while (m.find()) {
title = m.group(1);
}