4

I'm trying to split a sentence by whitespace/space but I must exclude space located inside parenthesis (), accolades {} or squared brackets [].

ex string: [apples carrots] (car plane train) {food water} foo bar should result in an array containing:

  • [apples carrots]
  • (car plane train)
  • {food water}
  • foo
  • bar

Any ideas?

Romeo Mihalcea
  • 9,714
  • 12
  • 50
  • 102
  • 1
    possible duplicate of [Split string by all spaces except those in brackets](http://stackoverflow.com/questions/12884573/split-string-by-all-spaces-except-those-in-brackets) – Bergi Sep 17 '13 at 23:15
  • Do you forsee nested punctuation (balanced). If true, there can be solutions for that as well. Just a FYI .. –  Sep 18 '13 at 01:21

3 Answers3

5

Not splitting, but matching and trimming. Example is in JavaScript, you can try it out in browser console:

var a = '[apples carrots] (car plane train) {food water} foo bar';
a.match(/[a-zA-Z0-9\[\]\(\){}]+/g).map(function (s) { return s.replace(/[\[\]\(\)\{\}]/, ''); });
["apples", "carrots", "car", "plane", "train", "food", "water", "foo", "bar"]

Alternatively:

a.split(/\s+(?![^\[]*\]|[^(]*\)|[^\{]*})/)

Produces:

["[apples carrots]", "(car plane train)", "{food water}", "foo", "bar"]
uKolka
  • 36,422
  • 4
  • 33
  • 44
4

Split on whitespace followed by a positive look-ahead that checks if next bracket char (if any) is an open one (or end of input):

\s+(?=[^\])}]*([\[({]|$))
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • You might be missing a closing parenthesis. `\s+( <-- Unbalanced '(' ?=[^\])}]*([\[({]|$)` –  Sep 18 '13 at 01:14
  • @sln quite so. Fixed. Thx. (I typed it in on my iPhone - can be hard to pick up stuff like that) – Bohemian Sep 18 '13 at 01:45
  • @Bohemian, this was not really working for me: it is missing some blank spaces which are not in parenthesis – François Romain Apr 25 '16 at 23:11
  • @francoisromain what was your input exactly, and what do you expect from the match? – Bohemian Apr 26 '16 at 16:18
  • @Bohemian input: `'#666 white'`. output: `['#666', '', 'white']`, where it should be: `['#666', 'white']`. (there is an extra empty string in the middle) – François Romain Apr 26 '16 at 16:34
  • @fran show me the whole code, like `"the exact string you're using".split("the exact regex you're using")` – Bohemian Apr 27 '16 at 10:19
  • @Bohemian just copy your regex above: `var t = '#666 white'; var r = t.split(/\s+(?=[^\])}]*([\[({]|$))/); console.log(r);` – François Romain Apr 27 '16 at 13:23
  • I'm really appreciate your answer. But I have a further problem that is I have a recurive brackets and this regex capture space between '[' (left brackets). Would you have any help? text is `[[ []]]` How can I prevent regex to capture any space outside most outter brackets. – Xiao Jan 24 '18 at 07:11
  • 1
    @Natt I think [this answer](https://stackoverflow.com/a/17986078/256196) will do what you want if modify it to use square brackets – Bohemian Jan 24 '18 at 16:12
  • @Bohemian I have tried your suggestion but it still not work, that regex still match text in bracket. Could it be that there are nested bracket? https://imgur.com/a/U4Je4 – Xiao Jan 29 '18 at 08:30
  • @natt That other answer will work with nested brackets. What is the input that it isn't working for? Even better, post a new question with your best effort at a regex so far and the input it's not working with and details of *how* it's not working and how you want it to work. – Bohemian Jan 29 '18 at 08:57
0

to match the space outside (), {} and [] use this pattern (\s)(?:(?=(?:(?![\]\)}]).)*[\[\({])|(?!.*[\]\)}])) Demo

alpha bravo
  • 7,838
  • 1
  • 19
  • 23