0

I am trying to parse a string in this format

[something](something something) [something](something something)

and I want to break on every space that is not between a set of parenthesis?

I tried using js string.split with this as the regex /[^\(].*\s+.*[^\)]/g, but it doesn't work? Any suggestions appreciated :-)

EDIT: I don't want to post this as an answer, because I want to leave it open to comments but I finally found a solution.

var a = "the>[the](the the) the>[the](the the) the"
var regex = /\s+(?!\w+[\)])/
var b = a.split(regex)
alert(b.join("+++"))
rubixibuc
  • 7,111
  • 18
  • 59
  • 98
  • 2
    (mis)?quoted from jwz: "Many people, when faced with a problem, think to themselves, 'I know! I'll use regular expressions!' Now they have two problems." – riwalk Aug 01 '11 at 20:35
  • there wouldn't be a need to nest them, they are just there to prevent strings which normally would include spaces, keep there spaces when paresd – rubixibuc Aug 01 '11 at 20:40
  • So "(input input) input (input)" could be a sample string which would become "(input input)" "input" "(input)" – rubixibuc Aug 01 '11 at 20:50

3 Answers3

2

Is your input always this consistent? If it is, it could be as simple as splitting your string on ') ['

If it isn't, is it possible to just take what is between [ and )? Or is there some kind of nesting that is going on?

gnur
  • 4,671
  • 2
  • 20
  • 33
1

You are using the wrong tool for the job.

As was alluded to in this famous post, regular expressions cannot parse non-regular languages, and the "balanced parenthesis" problem cannot be described by a regular language.

Have you tried writing a parser instead?

EDIT:

It seems that you've finally clarified that nesting is not a requirement. In that case, I'd suggest gnur's solution.

Community
  • 1
  • 1
riwalk
  • 14,033
  • 6
  • 51
  • 68
  • Sure, but in this case it is very possible to use a regular expression if the rules are very well known. – gnur Aug 01 '11 at 20:38
  • what about another symbol like quotes, could split on everything except what is between quotes or is that the same thing? – rubixibuc Aug 01 '11 at 20:39
  • @gnur, and that is the crux of the matter. I don't know if nested parenthesis are allowed. If nested parenthesis are not allowed, then go for it. If nested parenthesis *are* allowed, then an unholy child will weep the blood of virgins, and Russian hackers will pwn his webapp. – riwalk Aug 01 '11 at 20:40
  • what makes you think his input is not regular? – automagic Aug 01 '11 at 20:40
  • @James Connell, read more of the requests for clarification. The author REFUSES to answer whether or not nested parenthesis are allowed. If they are, then the input is not regular. – riwalk Aug 01 '11 at 20:42
  • There doesn't need to be nested parenthesis :-) – rubixibuc Aug 01 '11 at 20:43
  • @stargazer712 refuses? He did answer that there won't be a need for nesting before your comment. – gnur Aug 01 '11 at 20:43
  • @rubixibug, then use gnur's solution. – riwalk Aug 01 '11 at 20:43
  • @gnur, chill. The two comments posted less than 1 minute apart. Sheesh. – riwalk Aug 01 '11 at 20:45
  • @stargazer ok, just thought that shouting REFUSES was a bit unneeded when the original question wasn't even 10 minutes old, no hard feelings. – gnur Aug 01 '11 at 20:48
0

This regex will do exactly what you asked, and nothing more:

'[x](x x) [x](x x)'.split(/ +(?![^\(]*\))/);
chjj
  • 14,322
  • 3
  • 32
  • 24