0

How to match the whole ("SOMETHING","SOMETHING","SOMETHING",...) expression(only quoted upper case A-Z characters without special symbols and whitespaces) and group quoted strings?

("JOY","SAD") - should match
("JOY","sad") - shouldn't match
("JOY",0) - shouldn't match
("JOY","'")- shouldn't match
("JOY",SAD) - shouldn't match
("JOY","") - shouldn't match
("") - shouldn't match
("0") - shouldn't match
(a) - shouldn't match

Tried this regex expression - it groups it correct but still some examples are passed:

\((\"([A-Z]+)*\")\)

UPDATE

Used suggested regex by @anubhava a little bit modifying match group - (?:\(|\G(?!^),)\"([A-Z]+)\"(?=(?:,\"[A-Z]+\")*\)$) in java Pattern.compile:

Pattern.compile("(?:^\\(|\\G(?!^),)(\\\"[A-Z]+\\\")(?=(?:,\\\"[A-Z]+\\\")*\\)$)")

However, why the same regex expression cannot be matched if I use java Patter.compile() method?

Viktor M.
  • 4,393
  • 9
  • 40
  • 71
  • `\(\"[A-Z]+\"(?:,\"[A-Z]+\")*\)`? See [the regex demo](https://regex101.com/r/TzSWo5/2) – Wiktor Stribiżew Jun 09 '20 at 14:23
  • It matches first example but it doesn't group all string in quotes(it groups only first string in quotes). – Viktor M. Jun 09 '20 at 14:43
  • Also, see [How to capture multiple repeated groups?](https://stackoverflow.com/questions/37003623/) In PCRE, you would use something like `(?:\G(?!^),|^\((?=\"[A-Z]+\"(?:,\"[A-Z]+\")*\)$))\K\"[A-Z]+\"`, see [regex demo](https://regex101.com/r/ndn62J/1). – Wiktor Stribiżew Jun 09 '20 at 14:53
  • I use this platform - https://regex101.com/ . I just cannot understand how to combine input validation and string grouping. Also amount of quoted strings might be different from 1 to 10 quoted strings in parentheses. I need to validate the whole input and then extract all strings from quotes. – Viktor M. Jun 09 '20 at 15:01
  • 1
    @ViktorV.: See this regex demo: https://regex101.com/r/Fob7Xq/1 – anubhava Jun 09 '20 at 15:09
  • @anubhava Your regex expression works fine in regex101.com platform but why the same regex couldn't match in java Pattern.compile method? – Viktor M. Jun 10 '20 at 09:29
  • @ViktorV.: Are you using Java? – anubhava Jun 10 '20 at 10:21
  • @anubhava Yes, correct. Tried your regex expr with escaped symbols: Pattern.compile("(?:^\\(|\\G(?!^),)(\\\"[A-Z]+\\\")(?=(?:,\\\"[A-Z]+\\\")*\\)$)") – Viktor M. Jun 10 '20 at 10:32
  • I have posted a detailed answer below with Java code. – anubhava Jun 10 '20 at 10:55

1 Answers1

2

Based on discussion in comments section, you may use this regex using \G for validating and retrieving individual groups:

(?:^\(|\G(?!^),)("[A-Z]+")(?=(?:,"[A-Z]+")*\)$)
  • \G asserts position at the end of the previous match or the start of the string for the first match.
  • \G(?!^): Make sure \G is not matched at line start
  • (?=(?:,"[A-Z]+")*\)$): Positive lookahead to assert that we have zero or more quoted strings ahead using comma delimiter.

RegEx Demo

Java Code:

final String str = "(\"JOY\",\"SAD\")";
final Pattern p = Pattern.compile(
                   "(?:^\\(|\\G(?!^),)(\"[A-Z]+\")(?=(?:,\"[A-Z]+\")*\\)$)");
Matcher m = p.matcher( str );

while ( m.find() ) {
    System.out.println( m.group(1) );
}

Output:

"JOY"
"SAD"
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Thanks for explanatory answer! Hm, I got what is confused me a lot. I used Pattern.compile(...).matcher(...).matches() where matches() returns false by some reason. In your example the same result for this method. However, find() method find all strings that I need. – Viktor M. Jun 10 '20 at 11:54