0

I asked How to split a string with conditions. Now I know how to ignore the delimiter if it is between two characters.

How can I check multiple groups of two characters instead of one? I found Regex for splitting a string using space when not surrounded by single or double quotes, but I don't understand where to change '' to []. Also, it works with two groups only.

Is there a regex that will split using , but ignore the delimiter if it is between "" or [] or {}? For instance:

// Input
"text1":"text2","text3":"text,4","text,5":["text6","text,7"],"text8":"text9","text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}
// Output
"text1":"text2"
"text3":"text,4"
"text,5":["text6","text,7"]
"text8":"text9"
"text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}
Community
  • 1
  • 1
spongebob
  • 8,370
  • 15
  • 50
  • 83
  • 5
    It is very likely that this isn't a language based on a Chomsky type-3 grammar that can be parsed with regular expressions. Is the comma in `"xx[xx","yy]yy"` fit to split, or is it enclosed in brackets and should be ignored? – laune Aug 17 '14 at 17:02
  • 2
    Do you plan to support nested braces of arbitrary depth??? If so: this is not possible to do with regular expressions. You'll need something that is at least as powerful as a [pushdown automaton](http://en.wikipedia.org/wiki/Pushdown_automaton) (regexes are only as powerful as [a finite state machine](http://en.wikipedia.org/wiki/Finite-state_machine)). It's easy to implement this using a LIFO queue. – fabian Aug 17 '14 at 17:19
  • @laune: I think that brackets in this particular case are ignored. But the potential problem is with nested json object, since the Java regex engine doesn't have a recursion feature. However, do not make confusion between regular expressions in the theoretical sense and what is commonly called 'regex' abusively (that is a tool with more advanced features and with less limitations, several regex engines have the recursion feature) otherwise you will experience an encounter of the third kind and they will take you directly to the planet Chomsky for applying the sentence. – Casimir et Hippolyte Aug 17 '14 at 17:29
  • @laune I'm looking for a regex that I can easily edit to add/remove groups of characters to check. – spongebob Aug 18 '14 at 15:18
  • @fabian I separately handle nested "items". – spongebob Aug 18 '14 at 15:20
  • @CasimiretHippolyte [See above.](http://stackoverflow.com/questions/25351421/how-can-i-split-a-string-except-when-the-delimiter-is-protected-by-quotes-or-bra#comment39554042_25351421) – spongebob Dec 27 '14 at 12:36
  • there are libraries around to parse json, you can even do that online http://json.parser.online.fr/ (just add `{}`around you input) go to you favourite searchengine and key in java json and pick your poison – A ツ Apr 23 '15 at 22:26
  • @Aツ I'm trying to build my own. – spongebob Apr 24 '15 at 11:10

1 Answers1

3

You can use:

text = "\"text1\":\"text2\",\"text3\":\"text,4\",\"text,5\":[\"text6\",\"text,7\"],\"text8\":\"text9\",\"text10\":{\"text11\":\"text,12\",\"text13\":\"text14\",\"text,15\":[\"text,16\",\"text17\"],\"text,18\":\"text19\"}";

String[] toks = text.split("(?=(?:(?:[^\"]*\"){2})*[^\"]*$)(?![^{]*})(?![^\\[]*\\]),+");
for (String tok: toks)
    System.out.printf("%s%n", tok);

- RegEx Demo

OUTPUT:

"text1":"text2"
"text3":"text,4"
"text,5":["text6","text,7"]
"text8":"text9"
"text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Is this regex specific for my example? Because `"ke,y":{"key,2":"val,ue"},"te,st":"h,i"` returns `"ke,y":{"key,2":"val,ue"}` instead of `"ke,y":{"key,2":"val,ue"}` and `"te,st":"h,i"`. I would a **generic** answer. – spongebob Aug 18 '14 at 15:31
  • Check my updated answer with a demo link that works with your new input as well. – anubhava Aug 18 '14 at 16:05
  • `System.out.printf("%s%n", tok)`: what does `%s%n` mean? **Do I need to format every `tok`?** – spongebob Aug 19 '14 at 06:24
  • No not really, actually `System.out.printf("%s%n", tok)` is equivalent of `System.out.println(tok)` – anubhava Aug 19 '14 at 06:28