2

How can I split foreign characters, such as Chinese, into separate array values using JavaScript?

split() seems to work well with English, but not so much with Chinese. See below result of two strings

a) Hello There

b) 你好吗

splitString = text.split(" ");

RESULT: ["hello", "there"] 
RESULT: ["你好吗"]
user3871
  • 12,432
  • 33
  • 128
  • 268
  • 1
    You seem to be confused about the nature of "words" in Chinese. Chinese can be considered to have a concept of "words", but it is not necessarily well-defined. You are looking for the idea of "segmentation", but segmentation in Chinese (and other languages without spaces, including Thai, Korean, and Japanese) is a quite complex linguistic task, which as another commenter mentioned, is implemented in libraries. On the other hand, if you merely want to split by character, then `String#split` does exactly what you want in most cases. –  Oct 06 '15 at 04:32
  • Note that all answers in this question use character split, which is the same as [How do you get a string to a character array in JavaScript?](https://stackoverflow.com/questions/4547609/how-do-you-get-a-string-to-a-character-array-in-javascript). – user202729 Aug 30 '18 at 06:22

2 Answers2

5

There is no way to do that reliably using built-in ES5 facilities without using any 3rd party libraries.

The correct way using vanilla JS is to use ES2015 spread operator:

let splitString = [...text];

Examples of strings which would cause the split-based solutions to fail:

zerkms
  • 249,484
  • 69
  • 436
  • 539
0

Instead of splitting on a space char (which there aren't any in the chinese string), try splitting on an empty string "", which should split each char into its own element.

mitim
  • 3,169
  • 5
  • 21
  • 25