2

Hi I'm trying to create a Regex to help separate a string into a series of object fields, however having issues where the individual field values themselves are lists and therefore comma separated internally.

string = "field1:1234,field2:[[1, 3],[3,4]], field3:[[1, 3],[3,4]]"

I want the regex to identify only the commas before "field2" and "field3", ignoring the ones separating the list values (e.g. 1 and 3, ] and [, 3 and 4.

I've tried using non-capturing groups to ignore the character after the commas (e.g. (,)([?!a-z]) ) but given I'm running this in Kotlin I don't think non-capturing and group separation is useful.

Is there a way to ignore string values between specified characters? E.g. ignore anything between "[[" and "]]" would work here.

Any help appreciated.

Will
  • 31
  • 2

1 Answers1

2

You can tweak the existing Java recursion mimicking regex to extract all the matches you need:

val rx = """\w+:(?:(?=\[)(?:(?=.*?\[(?!.*?\1)(.*\](?!.*\2).*))(?=.*?\](?!.*?\2)(.*)).)+?.*?(?=\1)[^\[]*(?=\2$)|\w+)""".toRegex()
val matches = rx.findAll(string).map{it.value}.joinToString("\n")

See the regex demo. Quick details:

  • \w+ - one or more letters, digits, underscores
  • : - a colon
  • (?: - start of a non-capturing group matching either
    • (?=\[)(?:(?=.*?\[(?!.*?\1)(.*\](?!.*\2).*))(?=.*?\](?!.*?\2)(.*)).)+?.*?(?=\1)[^\[]*(?=\2$) - a substring between two paired [ and ]
    • | - or
    • \w+ - one or more word chars
  • ) - end of the non-capturing group.

See the Kotlin demo:

val string = "field1:1234,field2:[[1, 3],[3,4]], field3:[[1, 3],[3,4]]"
val rx = """\w+:(?:(?=\[)(?:(?=.*?\[(?!.*?\1)(.*\](?!.*\2).*))(?=.*?\](?!.*?\2)(.*)).)+?.*?(?=\1)[^\[]*(?=\2$)|\w+)""".toRegex()
print( rx.findAll(string).map{it.value}.joinToString("\n") )

Output:

field1:1234
field2:[[1, 3],[3,4]]
field3:[[1, 3],[3,4]]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563