0

The string I watch to match against is as follow:

5 + __FXN1__('hello', 1, 3, '__HELLO__(hello) + 5') + 5 + (2/2) + __FXN2__('Good boy')

I tried with regex express [A-Z0-9_]+\(.*?\) which matches

__FXN1__('hello', 1, 3, '__HELLO__(hello) and __FXN2__('Good boy')

What I am expecting is:

__FXN1__('hello', 1, 3, '__HELLO__(hello) + 5') and __FXN2__('Good boy')

How can we achieve it. Please help.

wrufesh
  • 1,379
  • 3
  • 18
  • 36
  • 1
    Capturing paired parentheses, allowing embedded parentheses, and coping with unbalanced ones is a very tricky requirement for a regex. You'd do better with a parser instead. – PA. Jun 01 '21 at 10:32

1 Answers1

0

If the parentheses are always balanced, you can use a recursion-based regex like

__[A-Z0-9_]+__(\((?:[^()]++|(?-1))*\))

may fail if there is an unbalanced amount of ( or ) inside strings, see this regex demo. In brief:

  • __[A-Z0-9_]+__ - __, one or more uppercase letters, digits or _ and then __
  • (\((?:[^()]++|(?-1))*\)) - Group 1: (, then any zero or more occurrences of one or more chars other than ( and ) or the whole Group 1 pattern recursed, and then a ) (so the (...) substring with any amount of paired nested parentheses is matched).

If you need to support unbalanced parentheses, it is safer to use a regex that just matches all allowed data formats, e.g.

__[A-Z0-9_]+__\(\s*(?:'[^']*'|\d+)(?:\s*,\s*(?:'[^']*'|\d+))*\s*\)

See the regex demo. Or, if ' can be escaped with a \ char inside the '...' strings, you can use

__[A-Z0-9_]+__\(\s*(?:'[^'\\]*(?:\\.[^'\\]*)*'|\d+)(?:\s*,\s*(?:'[^'\\]*(?:\\.[^'\\]*)*'|\d+))*\s*\)

See this regex demo.

Details:

  • __[A-Z0-9_]+__ - __, one or more upper or digits and then __
  • \( - ( char
  • \s* - zero or more whitespaces
  • (?:'[^']*'|\d+) - ', zero or more non-' and then a ' or one or more digits
  • (?:\s*,\s*(?:'[^']*'|\d+))* - zero or more occurrences of a , enclosed with optional whitespace and then either a '...' substring or one or more digits
  • \s*\) - zero or more whitespace and then a ).

Note if you need to support any kind of numbers, you need to replace \d+ with a more sophisticated pattern like [+-]?\d+(?:\.\d+)? or more.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you for your prompt answer. I am trying to learn what you have pointed. Meanwhile I also want `5 + __FXN1__('hello', 1, 3, __HELLO__(hello) + 5) + 5 + (2/2) + __FXN2__('Good boy')` this to be matched. – wrufesh Jun 01 '21 at 10:23
  • @wrufesh You can go on enhancing the pattern, there can be more examples, and now you know how to add their support. – Wiktor Stribiżew Jun 01 '21 at 10:25
  • @wrufesh Note you can still use the recursion based regex if you are absolutely sure there are no unbalanced parentheses in the input. However, it is the point when you should stop thinking about a single regex to parse these inputs, but write a dedicated format parser. [This](https://regex101.com/r/qLz4Ke/8) is really unwieldly. – Wiktor Stribiżew Jun 01 '21 at 10:36
  • the last one is what i wanted actually. I am sure there is no unbalanced parenthesis or you can say I do not need to take care of that. Really appreciate you help. Thank you. :) – wrufesh Jun 01 '21 at 10:46
  • @wrufesh I re-vamped the answer then. – Wiktor Stribiżew Jun 01 '21 at 10:51