I would like to extract some text from an html file using Regex. I am learning regex and I still have trouble understanding it all. I have a code which extracts all the text included betweeen <body>
and </body>
here it is:
public class Harn2 {
public static void main(String[] args) throws IOException{
String toMatch=readFile();
//Pattern pattern=Pattern.compile(".*?<body.*?>(.*?)</body>.*?"); this one works fine
Pattern pattern=Pattern.compile(".*?<table class=\"claroTable\".*?>(.*?)</table>.*?"); //I want this one to work
Matcher matcher=pattern.matcher(toMatch);
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
}
private static String readFile() {
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("user.html");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine = null;
//Read File Line By Line
while (br.readLine() != null) {
// Print the content on the console
//System.out.println (strLine);
strLine+=br.readLine();
}
//Close the input stream
in.close();
return strLine;
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
return "";
}
}
}
Well it works fine like this but now I would like to extract the text between the tag:
<table class="claroTable">
and </table>
So I replace my regex string by ".*?<table class=\"claroTable\".*?>(.*?)</table>.*?"
I have also tried
".*?<table class=\"claroTable\">(.*?)</table>.*?"
but it doesn't work and I don't understand why. There is only one table in the html file but there is an occurence of "table" in a javascript code : "...dataTables.js..." could that be the reason for the mistake?
Thank you in advance for helping me,
EDIT: the html text to extranct is something like:
<body>
.....
<table class="claroTable">
<td><th>some data and manya many tags </td>
.....
</table>
What I would like to extract is anything between <table class="claroTable">
and </table>
http://bejavadeveloper.blogspot.in/ – Jignesh Vachhani Oct 16 '12 at 09:15