0

I had asked this question some times back here Regular expression that does not contain quote but can contain escaped quote and got the response, but somehow i am not able to make it work in Java.

Basically i need to write a regular expression that matches a valid string beginning and ending with quotes, and can have quotes in between provided they are escaped.

In the below code, i essentially want to match all the three strings and print true, but cannot.

What should be the correct regex?

Thanks

public static void main(String[] args) {

    String[] arr = new String[] 
            { 
                "\"tuco\"", 
                "\"tuco  \" ABC\"",
                "\"tuco \" ABC \" DEF\"" 
            };

    Pattern pattern = Pattern.compile("\"(?:[^\"\\\\]+|\\\\.)*\"");

    for (String str : arr) {
        Matcher matcher = pattern.matcher(str);
        System.out.println(matcher.matches());
    }

}
Community
  • 1
  • 1
Tuco
  • 712
  • 2
  • 8
  • 20
  • 1
    Your code doesn't match your description. Note that the elements of `arr` are the strings containing `"tuco"`, `"tuco " ABC"`, and `"tuco " ABC " DEF"` -- that is, the quotes in-between are *not* escaped. – ruakh Feb 29 '12 at 23:13
  • To add to @ruakh's comment, a quoted escape in code would look like `"tuco \\\" ABC\\\""`. – Jim Garrison Feb 29 '12 at 23:24
  • I meant that the regex needs to match the valid string. If the string has quotes in it, they will be escaped, which will make it a valid string. Essentially i want to get the output true for all three of the expressions. – Tuco Feb 29 '12 at 23:39
  • So you want `for (String str : arr) System.out.println("true");`? – Qtax Feb 29 '12 at 23:46
  • didn't you forget to declare arr?? I essentially want to write the regex to match the valid string, ofcourse i string variable i am passing cannot be invalid strings, so i only have to get true. The reason is that this regular expression, i have to use somewhere else, in a place to generate a custom parser using Javacc...right now, in that parser, it looks for a string which starts and ends with quotes and which does not contain quote, but i have to modify it so that it can contain quote, only if they are escaped. I hope i clarified something. – Tuco Feb 29 '12 at 23:47

1 Answers1

0

The problem is not so much your regex, but rather your test strings. The single backslash before the internal quotes on your second and third example strings are consumed when the literal string is parsed. The string being passed to the regex engine has no backslash before the quote. (Try printing it out.) Here is a tested version of your function which works as expected:

import java.util.regex.*;
public class TEST
{
    public static void main(String[] args) {

        String[] arr = new String[] 
                { 
                    "\"tuco\"", 
                    "\"tuco  \\\" ABC\"",
                    "\"tuco \\\" ABC \\\" DEF\"" 
                };

//old:  Pattern pattern = Pattern.compile("\"(?:[^\"\\\\]+|\\\\.)*\"");
        Pattern pattern = Pattern.compile(
            "# Match double quoted substring allowing escaped chars.     \n" +
            "\"              # Match opening quote.                      \n" +
            "(               # $1: Quoted substring contents.            \n" +
            "  [^\"\\\\]*    # {normal} Zero or more non-quote, non-\\.  \n" +
            "  (?:           # Begin {(special normal*)*} construct.     \n" +
            "    \\\\.       # {special} Escaped anything.               \n" +
            "    [^\"\\\\]*  # more {normal} non-quote, non-\\.          \n" +
            "  )*            # End {(special normal*)*} construct.       \n" +
            ")               # End $1: Quoted substring contents.        \n" +
            "\"              # Match closing quote.                        ", 
            Pattern.DOTALL | Pattern.COMMENTS);

        for (String str : arr) {
            Matcher matcher = pattern.matcher(str);
            System.out.println(matcher.matches());
        }
    }
}

I've substituted your regex for an improved version (taken from MRE3). Note that this question gets asked a lot. Please see this answer where I compare several functionally equivalent expressions.

Community
  • 1
  • 1
ridgerunner
  • 33,777
  • 5
  • 57
  • 69