I have to validate that <tagN>
(where N is a number) is inside tag <p></p>
. In case, it's not inside <p>
, I have to add it. Otherwise is OK. I have all these cases, I was trying for a while but I couldn't find a pattern to cover all the cases:
import java.util.regex.*;
public class Main {
static String case1 = "<p><tag1></p>"; // Output: Group 1: <p>. Group 2: <tag1>. Group 3: </p>
static String case2 = "<tag1>"; // Output: Group 1: null. Group 2: <tag1>. Group 3: null
static String case3 = "<p> <tag1> </p>"; // Output: Group 1: <p>. Group 2: <tag1>. Group 3: </p>
static String case4 = "<><tag1></p>"; // NO OK. Output: Group 1: null. Group 2: <tag1>. Group 3: </p>
static String case5 = "<p><tag1><tag2></p>"; // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
static String case6 = "<p> <tag1> <tag2> </p>"; // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
static String case7 = "<p> <tag1>\n\n<tag2> </p>"; // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
static String case8 = "<p>\n\n <tag1>\n\n<tag2> \n</p>"; // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
static String case9 = " <tag1> <tag2> "; // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null
static String case10 = " <tag1>\n\n<tag2> "; // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null
static String case11 = "\n\n <tag1>\n\n<tag2> \n"; // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null
public static void main(String[] args) {
//String patternString = "(<p>\\s*)*([<tag\\d+>\\s*]+)(\\s*</p>)*"; // Works only for cases 2, 9, 10 and 11
//String patternString = "(<p>\\s*)*(<tag\\d+>+)(\\s*</p>)*"; // Works only for cases 1, 2, 3, 4
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(case5);
while (matcher.find()) {
System.out.println("Group 0: " + matcher.group(0));
System.out.println("Group 1: " + matcher.group(1));
System.out.println("Group 2: " + matcher.group(2));
System.out.println("Group 3: " + matcher.group(3));
// The idea here is add tag <p> when group 1 is null and tag </p> when group 3 is null
}
}
}
Basically, I tried to split in 3 groups:
Group 1: tag
(<p>\\s*)* // \\s is for whitespaces/tab/newlines in case it finds 0 or more
Group 2: Trying to repeat tag1, tag2, etc. that's the reason to enclosed in
[]+
but it seems doesn't work OKGroup 3: tag
(\\s*</p>)* // \\s is for whitespaces/tab/newlines in case it finds 0 or more
Any idea? Thanks