0

I have to validate that <tagN> (where N is a number) is inside tag <p></p>. In case, it's not inside <p>, I have to add it. Otherwise is OK. I have all these cases, I was trying for a while but I couldn't find a pattern to cover all the cases:

import java.util.regex.*;


public class Main {

    static String case1 = "<p><tag1></p>";                              // Output: Group 1: <p>. Group 2: <tag1>. Group 3: </p>
    static String case2 = "<tag1>";                                     // Output: Group 1: null. Group 2: <tag1>. Group 3: null
    static String case3 = "<p>     <tag1>        </p>";                 // Output: Group 1: <p>. Group 2: <tag1>. Group 3: </p>
    static String case4 = "<><tag1></p>";                               // NO OK. Output: Group 1: null. Group 2: <tag1>. Group 3: </p>
    static String case5 = "<p><tag1><tag2></p>";                        // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
    static String case6 = "<p>   <tag1>  <tag2>   </p>";                // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
    static String case7 = "<p>   <tag1>\n\n<tag2>   </p>";              // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
    static String case8 = "<p>\n\n   <tag1>\n\n<tag2>   \n</p>";        // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
    static String case9 = "   <tag1>  <tag2>   ";                       // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null
    static String case10 = "  <tag1>\n\n<tag2>   ";                     // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null
    static String case11 = "\n\n   <tag1>\n\n<tag2>   \n";              // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null

    public static void main(String[] args) {
        //String patternString = "(<p>\\s*)*([<tag\\d+>\\s*]+)(\\s*</p>)*"; // Works only for cases 2, 9, 10 and 11
        //String patternString = "(<p>\\s*)*(<tag\\d+>+)(\\s*</p>)*"; // Works only for cases 1, 2, 3, 4
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(case5);

        while (matcher.find()) {
            System.out.println("Group 0: " + matcher.group(0));
            System.out.println("Group 1: " + matcher.group(1));
            System.out.println("Group 2: " + matcher.group(2));
            System.out.println("Group 3: " + matcher.group(3));

            // The idea here is add tag <p> when group 1 is null and tag </p> when group 3 is null
        }

    }
}

Basically, I tried to split in 3 groups:

  • Group 1: tag (<p>\\s*)* // \\s is for whitespaces/tab/newlines in case it finds 0 or more

  • Group 2: Trying to repeat tag1, tag2, etc. that's the reason to enclosed in []+ but it seems doesn't work OK

  • Group 3: tag (\\s*</p>)* // \\s is for whitespaces/tab/newlines in case it finds 0 or more

Any idea? Thanks

David
  • 169
  • 5
  • 14
  • 2
    I know that linking to [this answer](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) is forbidden. But take a look. –  Aug 02 '19 at 19:37

2 Answers2

0

Finally, I had to replace <tag1> by ~tag1~ and the first expression was OK.

String patternString = "(<p>\\s*)*([~tag\\d+~\\s*]+)(\\s*</p>)*";

With these 2 changes I got the expected result. Thanks

David
  • 169
  • 5
  • 14
0

I have to validate that <tagN> (where N is a number) is inside tag <p></p>. In case, it's not inside <p>, I have to add it.

I'm interpreting "I have to add it" as adding <p></p> around the <tagN>.

To do that, you can use a replacement loop.

Regex is <p>.*?</p>|(<tag\d+>), which when finding a <p> will skip everything until the first following </p>, or when finding a <tagN> will capture it, so we can surround it with <p></p>.

Code (Java 1.4+)

Pattern p = Pattern.compile("<p>.*?</p>|(<tag\\d+>)", Pattern.DOTALL);
Matcher m = p.matcher(input);
StringBuffer buf = new StringBuffer();
while (m.find()) {
    if (m.start(1) != -1)
        m.appendReplacement(buf, "<p>$1</p>");
}
String fixed = m.appendTail(buf).toString();

Short Version (Java 9+)

Pattern p = Pattern.compile("<p>.*?</p>|(<tag\\d+>)", Pattern.DOTALL);
String fixed = p.matcher(input).replaceAll(r -> r.start(1) == -1 ? r.group() : "<p>$1</p>");

Test

String[] inputs = {
        "<p><tag1></p>",                              // Output: Group 1: <p>. Group 2: <tag1>. Group 3: </p>
        "<tag1>",                                     // Output: Group 1: null. Group 2: <tag1>. Group 3: null
        "<p>     <tag1>        </p>",                 // Output: Group 1: <p>. Group 2: <tag1>. Group 3: </p>
        "<><tag1></p>",                               // NO OK. Output: Group 1: null. Group 2: <tag1>. Group 3: </p>
        "<p><tag1><tag2></p>",                        // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
        "<p>   <tag1>  <tag2>   </p>",                // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
        "<p>   <tag1>\n\n<tag2>   </p>",              // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
        "<p>\n\n   <tag1>\n\n<tag2>   \n</p>",        // Output: Group 1: <p>. Group 2: <tag1><tag2>. Group 3: </p>
        "   <tag1>  <tag2>   ",                       // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null
        "  <tag1>\n\n<tag2>   ",                      // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null
        "\n\n   <tag1>\n\n<tag2>   \n" };             // Output: Group 1: null. Group 2: <tag1><tag2>. Group 3: null
Pattern p = Pattern.compile("<p>.*?</p>|(<tag\\d+>)", Pattern.DOTALL);
for (String input : inputs) {
    String fixed = p.matcher(input).replaceAll(r -> r.start(1) == -1 ? r.group() : "<p>$1</p>");
    System.out.println('"' + fixed + '"');
}

Output

"<p><tag1></p>"
"<p><tag1></p>"
"<p>     <tag1>        </p>"
"<><p><tag1></p></p>"
"<p><tag1><tag2></p>"
"<p>   <tag1>  <tag2>   </p>"
"<p>   <tag1>

<tag2>   </p>"
"<p>

   <tag1>

<tag2>   
</p>"
"   <p><tag1></p>  <p><tag2></p>   "
"  <p><tag1></p>

<p><tag2></p>   "
"

   <p><tag1></p>

<p><tag2></p>   
"
Andreas
  • 154,647
  • 11
  • 152
  • 247