I'm trying to find pieces of text on the webpage I fetch that lay between 'align="left">\n
" and '</form>\n</td>
' substrings.
I wrote a regex:
(align=\"left\">\\n)(?<part>.*?)(<\/form>\\n<\/td>)
and tested it at https://www.freeformatter.com/java-regex-tester.html where it works just as I need.
But in the Java code it can't find anything.
My test code that I'm trying make working:
String frontPage = "<html>\n<head>\n<title>Hello</title>\n</head>\n" +
"<body>\n<table>\n<tr align=\"left\">\n" +
"<td>Hello \n<form>\n<input type=\"submit\" value=\"ok\">\n" +
"</form>\n</td>\n" +
"<td>World \n<form>\n<input type=\"submit\" value=\"ok\">\n" +
"</form>\n</td>\n" +
"</tr>\n</table>\n</body>\n</html>";
java.util.regex.Pattern p =
java.util.regex.Pattern.compile(
"(align=\"left\">\\n)(?<part>.*?)(<\\/form>\\n<\\/td>)");
java.util.regex.Matcher m = p.matcher(frontPage);
List<String> parts = new ArrayList<>();
while (m.find()) {
parts.add(m.group("part"));
}
if (parts.size() == 0)
System.out.println("No page parts found");
else {
System.out.println("Something matches at least");
}
It finds matches if only first two groups specified, but when I add at least simple (form)
sequence to the last group, it stops matching anything, and I can't even guess why.