1

I'm trying to create a regex which will capture whole array of any objects inside it.

I've got example input string:

[2020-05-29T10:00:00, 12.5, 'Test text'][][[], ['Some Data']][['String with[ \'escaped quote][ and parenthesis inside it']]

Expected matches are:

Match 1: [2020-05-29T10:00:00, 12.5, 'Test text']
Match 2: []
Match 3: [[], ['Some Data']]
Match 4: [['String with[ \'escaped quote][ and parenthesis inside it']] // If this one is possible it's brilliant

Regex which I've already created is: \[[a-zA-Z0-9\-,' :\.\[]*\], but it doesn't handle array of arrays and parenthesis inside strings.

I would be really grateful for you help!

Genotypek
  • 213
  • 3
  • 11
  • There can be no `[[], []]` match here. – Wiktor Stribiżew Sep 22 '21 at 08:33
  • If you use PCRE, something that could work is `\[\s*(?>((?:'[^\\']*(?:\\[\s\S][^\\']*)*'|[^]'\s,])+)(?:\s*,\s*\g<1>)*|(?R))*\s*]`, but it might not work in all cases. `\[\s*(?>(\w+(?:\.\w+)*(?:\[\w+])*|(?:'[^\\']*(?:\\[\s\S][^\\']*)*'|[^]\w])+)(?:\s*,\s*\g<1>)*|(?R))*\s*]` might... But this is all too fragile, you need to get the appropriate parser. – Wiktor Stribiżew Sep 22 '21 at 09:19
  • I have something that will match your 4 matches, but I really need to know the engine before I can post it. It would be helpful if you could add a language tag, as the regex tag asks "this tag should also include a tag specifying the applicable programming language or tool". – Scratte Sep 22 '21 at 09:45
  • 1
    @Scratte Added a platform, it's .net C# – Genotypek Sep 22 '21 at 09:57
  • You cannot parse these with a regex, for the reasons explained in detail (for the equivalent problem of parsing HTML with regex) in this answer: https://stackoverflow.com/a/1732454 – Jiří Baum Sep 22 '21 at 10:19
  • Thank you for updating with the tags. My expression was relying on recursion, which doesn't seem to be supported by .NET. If you decide to switch engine, [you can try it](https://regex101.com/r/onXoqC/1) :) – Scratte Sep 22 '21 at 10:36

1 Answers1

1

This is similar to the question Regex nested parentheses - you should look at the accepted answer for a great explanation of what's going on.

The regex you want is, I believe:

\[(?>'(?:[^'\\]|\\.)*'|\[(?<DEPTH>)|\](?<-DEPTH>)|'(?:[^'\\]|\\.)*'|[^\[\]]+)*\](?(DEPTH)(?!))
Brett
  • 1,540
  • 9
  • 13
  • 1
    Why did you decide it is a .NET related question? Also, it won't work for the case where `[` and `]` are not paired inside a `'...'` string literal. Just checking for the order and amount of open/close brackets in the DEPTH group stack is not a solution here. – Wiktor Stribiżew Sep 22 '21 at 09:08
  • 1
    Fair point on the issue with `[` and `]` inside a string Wiktor. I tested against the provided example (which does work). .Net - it's what I'm familiar with. The question doesn't express a specific platform - should have cast my net wider! I'll update the "answer" to point out the very fair issues you've raised. :-) – Brett Sep 22 '21 at 09:36
  • @Brett Sorry for missing a platform in tags. It is exactly .net (C#). This regex is almost perfect, it matches it very well, but this mismatches: `[['String with[ ][ parenthesis inside it']]` First parenthesis is ignored in this case. Almost there – Genotypek Sep 22 '21 at 09:52
  • 1
    I think then that we need to consume anything within single quotes (a string) before considering the `[` and `]` for the DEPTH. I have tested the following which appears to work, assuming that a single quote within a string is escaped by being repeated (e.g. `'a string''s length'`). Here you go: `\[(?>'.*?'|\[(?)|\](?<-DEPTH>)|'.*?'|[^\[\]]+)*\](?(DEPTH)(?!))` – Brett Sep 22 '21 at 10:56
  • @Brett Brilliant. I've tried to make it work by escaping `'` inside string by using `\'`, it's more .net natural. Is it possible to include this last one adjustment? – Genotypek Sep 22 '21 at 11:41
  • @Brett If this helps there is a regex which respects quotes escaped with `\'` inside strings: `'(?:[^'\\]|\\.)*'` – Genotypek Sep 22 '21 at 11:50
  • @Brett I think I did it, I've modified your regex to respect escaped quotes inside strings and it looks this way: `\[(?>'(?:[^'\\]|\\.)*'|\[(?)|\](?<-DEPTH>)|'(?:[^'\\]|\\.)*'|[^\[\]]+)*\](?(DEPTH)(?!))`, just please edit your answer with this regex and I will mark it as an answer. – Genotypek Sep 22 '21 at 11:53
  • I've similarly come up with `\[(?>'(\\'|.)*?'|\[(?)|\](?<-DEPTH>)|'.*?'|[^\[\]]+)*\](?(DEPTH)(?!))`, but happy to put yours in place. – Brett Sep 22 '21 at 11:56