0

I am trying to split string in javascript by whitespaces, but ignoring whitespaces enclosed in quotes. So I googled this regular expression :(/\w+|"[^"]+"/g) but the problem is, that this isn't working with accented chars like á etc. So please how should I improve my regular expression to make it work?

m3div0
  • 1,556
  • 3
  • 17
  • 32
  • Can the string include quotes nested within quotes? If so, regex may not be the way to go. See this previous answer: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns – Tim Goodman Sep 23 '12 at 14:26
  • no the quotes are used only to mark word that shouldn't be splitted, the problem is only with accented chars – m3div0 Sep 23 '12 at 14:29
  • @david, are you using `split` or `exec`. If you're using the former then that regular expression is not what you want and in that case you should use the latter – Alexander Sep 23 '12 at 14:38

3 Answers3

1

That's because \w only matches [A-Za-z0-9_]. To match accented characters, add the unicode block range \x81-\xFF which includes the Latin-1 characters à and ã, et cetera:

(/[\w\x81-\xFF]+|"[^"]+"/g)

There's also this site, which is very helpful to build the required unicode block range.

João Silva
  • 89,303
  • 29
  • 152
  • 158
1

This matches non-spaces that don't contain quotes, and matches text between quotes:

/[^\s"]+|"[^"]+"/g
Tim Goodman
  • 23,308
  • 7
  • 64
  • 83
0

If you want to match all non-whitespace characters instead of only alphanumeric ones, replace \w with \S.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • 1
    If the string contains `"foo bar"` this will separately match `"foo` and `bar"`, whereas I think he'd want to match `"foo bar"`. I used `[^\s"]` in my answer to avoid this. – Tim Goodman Sep 23 '12 at 14:49
  • Right, thanks for the hint. `/"[^"]+"|\S+/g` should work as well – Bergi Sep 23 '12 at 14:53