I am trying to parse out data from a HTML page using a Java RegEx but have not had much luck. The data is dynamic and often includes zero to many instances of spaces, tabs, new lines. Also, depending on the number of hits the structure of the string I'm parsing may change. Here is a sample in the cleanest format:
<div class="center">Showing 25 of 2,343,098 (search took 1.245 seconds)</div>
However it can also look like this:
<div class="center">Showing 2343098 (search took 1.245 seconds)</div>
or
<div class="center">
Showing 125
of 2,343,098
(search took 1.245 seconds)</div>
What I'm trying to parse is the 2,343,098 but since the pages is HTML I have to use either "Showing" or "(search took" to search between. The spaces, tabs and new lines are tripping me up and I've been trying to use lookahead & lookbehind but so far no luck. Here are a few patterns I've tried
String pattern1 = "Showing [0-9]*\\S"; // not useful
String pattern2 = "[[\\d,+\\.?\\d+]*[\\s*\\n]\\(search took"; //fails
String pattern3 = "(/i)(Showing)(.+?)(\\(search took)"; //fails
String pattern4 = "([\\s\\S]*)\\(search took"; //fails
String pattern5 = "(?s)[\\d].*?(?=\\(search took)"; //close...but fails
Pattern pattern = Pattern.compile(pattern5);
Matcher matcher = pattern.matcher(text); // text = the string I'm parsing
while(matcher.find()) {
System.out.println(matcher.group(0));
}