0

If I have a string such as follows(which happens to look a lot like JSON, coincidentally):

"name" : "Precalculus",     
"authors" : ["Blitzer","Stewart"],  
"publisher" : { "name" : "McGraw Hill","year" : "2012",
                "city" : ["New York","London","Toronto"]
              }

How can I split this string by only the commas that do not appear inside either {}, [], or " " so that I would get the following separated strings:

  1. "name" : "Precalculus"
  2. "authors" : ["Blitzer","Stewart"]
  3. "publisher" : { "name" : "McGraw Hill","year" : "2012", "city" : ["New York","London","Toronto"] }

I know the above splitting can be easily done with just a loop that checks whether a given comma is between {}, [], or " " and then selectively splitting but using regular expressions seems like a cleaner option so any help would be appreciated.

Jenna Maiz
  • 792
  • 4
  • 17
  • 38
  • 3
    Regex is the _wrong_ tool for this. Instead, you should be using a JSON parser. Look into using Google's GSON library: https://www.mkyong.com/java/how-do-convert-java-object-to-from-json-format-gson-api/ – Tim Biegeleisen Nov 27 '16 at 03:19
  • Here's a similar question for CSV. http://stackoverflow.com/questions/18144431/regex-to-split-a-csv Based on the discussion there, I doubt regex will be cleaner. – markspace Nov 27 '16 at 03:21
  • @TimBiegeleisen What if I am trying to build my own JSON parser ?(as a learning experience, of course) – Jenna Maiz Nov 27 '16 at 03:21
  • Can't answer for Tim but I'd say regex is wrong for that too. Look at a basic discussion of a parser on Wikipedia for example to help you out. – markspace Nov 27 '16 at 03:23
  • 1
    Then go ahead and build a parser. You will probably have to use a stack. Regex may even be a part of the solution, but asking us for a regex to handle a complex JSON structure is not the way to go here. – Tim Biegeleisen Nov 27 '16 at 03:24
  • 1
    Elaboration: 1) Using a complex regex is not the right way to go technically for this problem. 2) Asking us to write regexes for you is not the right way to go if your goal is to learn how to write your own parser. – Stephen C Nov 27 '16 at 03:49
  • If you want to build your own parser, regexes are useful but in my opinion _only_ for extracting lexemes out of the input ("lexeme" meaning something like an identifier, numeric literal, string literal, punctuation character--basically one input element). Don't try to use them to detect more complex structures. – ajb Nov 27 '16 at 05:09
  • Give this a try : `String []ctArr = ct.split(", | \\[.*.\\] | \\{.*.\\}");` – Young Emil Nov 27 '16 at 13:16

1 Answers1

-2

Let me know if this would work for your purposes:

"[^,[{]++(?=,)|".*?[]}]++

https://regex101.com/r/vTx9gv/4

matches a string starting with a double quote followed by anything that is not a comma or opening bracket until it finds a character followed by a comma ...

-or-

a string starting with a double quote followed by as few characters as possible until it reaches one or more closing brackets

UPDATE

Couldn't help play with this one to account for the line terminators.

(?=.+?{)[\s\S]+?}|(?=.+?\[)[\s\S]+?]|[^,[{]+?(?=,)

https://regex101.com/r/vTx9gv/5

Requires support for lookaheads.

Looks ahead for a { and if it finds one, matches until it finds a } or...

Looks ahead for a [ and if it finds one, matches until it finds a ] or...

Matches not a comma or opening bracket/brace until it finds a character followed by a comma.

shrug
  • 47
  • 5