3

I want to validate an input field with regular expression in JavaScript, which should validate the following cases:

Valid:

A and B and C and D
(A or B) and C
(A or B or C) and D
(A or B or C or D) and E
A and (B or C) and D
A and (B or C) or (C and D)
A or (B and C)
(A and B) or (C and D)

Invalid:

A and B and C and 
(A or B and C
(A or B or C) and D or
(A or B or C or D and E
A and or (B or C) and D
A and (B or (C and D)))
A (B and C)
(A and B) or C and D)
(A and B or C and D)

Basically I need some letter from A-Z(only upper-case) followed by "and" or "or" and unlimited brackets, but the opening brackets amount should match the amount of closing ones. Also after an opening bracket I should have to be able to insert only A-Z upper-case and after a closing bracket "and", "or" or A-Z upper-case should also be valid. Nested brackets shouldn't also be valid.

I've came up with this solution, but it's only validating A-Z upper-case, "and" and "or" words and brackets, so all invalid cases provided are matching my regex.

/^[A-Z(]?[A-Z]| |and|or|[(]|[A-Z]|[)]/gm
psychozub
  • 31
  • 6
  • You can't do that with a regular expression, not in javascript at least. – gog Jul 11 '22 at 10:02
  • ...and even if you could, it would be painful. Recommend that you read about parsers instead. A simple parser-combinator library would do what you want and probably be easier to understand. – Roger Lipscombe Jul 11 '22 at 10:04
  • Are single-variable groups, as in `(A)` or `(A) and B`, considered valid? – Bergi Jul 11 '22 at 12:15

3 Answers3

2

Without nested brackets, this is easy. One disjunctive clause of the conjunctive normal form is

[A-Z]( or [A-Z])*

With parenthesis required around clauses using or:

[A-Z]|\([A-Z]( or [A-Z])*\)

The whole formula would then be

([A-Z]|\([A-Z]( or [A-Z])*\))( and ([A-Z]|\([A-Z]( or [A-Z])*\)))*
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
1

A JS regular expression could be:

^(?!\([^()]*\)$|.*([()])[^()]*(?=\1)|[^()]*[()](?:[^()]*[()][^()]*[()])*[^()]*$|.*\([A-Z]\))\(?[A-Z](?: (?:and|or) \(?[A-Z]\)?)*$

See an online demo


  • ^ - Start-line anchor;
  • (?! - Open a negative lookahead with alternations;
    • \([^()]*\)$ - Avoid a match with an operning paranthesis, 0+ characters other than paranthesis, and a closing paranthesis. Or;
    • .*([()])[^()]*(?=\1) - 0+ Character upto a opening/closing paranthesis in a 1st capture group followed by 0+ characters other than paranthesis upto a backreference to 1st group. Or;
    • [^()]*[()](?:[^()]*[()][^()]*[()])*[^()]*$ - A check for unbalanced paranthesis. The pattern will enfore there is a multiple of two paranthesis if any has been used. Or;
    • .*\([A-Z]\) - Test for 0+ characters followed by opening, capital letter and direct closing, to avoid (A)-like input;
  • \(?[A-Z] - Match an optional paranthesis followed by A-Z (to allow a single letter to be a valid match too);
  • (?: (?:and|or) \(?[A-Z]\)?)* - Open a non-capture group to match a space, a nested non-capture group to match and|or followed by another space, an optional operning paranthesis, another capital letter and an optional closing paranthesis. This grouping is matches 0+ times;
  • $ - End-line anchor.
JvdV
  • 70,606
  • 8
  • 39
  • 70
1

Really interesting one... Playing a bit with word boundaries, the shortest I came up so far:

^(?:(?:(?!^) (?:and|or) )?(?:\b[A-Z]|\B\([A-Z](?: (?:and|or) [A-Z])+\)\B)){2,}$

Here is a demo at regex101

The idea was to keep the length of the pattern short by making the and/or part outside brackets optional and force to occur only at allowed positions. Any groups used are set to non-capturing.

  • (?!^) the negative lookahead prevents the optional and/or part to match at start
  • \b the word boundary is used to prevent matching e.g. AB
  • \B the non word boundaries prevent matching from e.g. A(A and B)

Let me know where/if it fails, I have some feeling it's not working properly yet.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46