0

I have the following regex that doesn't account for Chinese Unicode characters. I'd like to include Chinese words. Anyone know how to do this in javascript?

\w+|"(?:\\"|[^"])+"
NullReference
  • 4,404
  • 12
  • 53
  • 90
  • 4
    What is the programming language? – Wiktor Stribiżew Aug 27 '20 at 18:45
  • For Java, just add `(?U)` at the start. In JavaScript, use `[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]` and `u` flag. In PHP, add `u` flag. In Python 3, .NET `\w` is already Unicode-aware. In Ruby, maybe `[\p{L}\p{N}\p{M}]` / `[\p{L}\p{N}\p{M}\p{Pc}]` will do. – Wiktor Stribiżew Aug 27 '20 at 18:48
  • @WiktorStribiżew javascript – NullReference Aug 27 '20 at 19:44
  • See https://stackoverflow.com/questions/62772641/whats-the-correct-regex-range-for-javascripts-regexes-to-match-all-the-non-wor/62772689#62772689. `\w` is the opposite of `\W`, so, you need to remove `^` after `[` in the top solution. – Wiktor Stribiżew Aug 27 '20 at 19:44

0 Answers0