1

Background: I am coding a word-wrapping function for any string in HTML canvas. I hope to create a function that works for all languages.

For English, maybe .split(" ") is enough, but this does not work for other language. For example, mixed language:

今天很美,Peter和Jane去了St. Thomas  // (Today is beautiful, Peter and Jane go to St. Thomas 123)

If uses split(""), then:

["今", "天", "很", "美", ",", "P", "e", "t", "e", ... "�", "�", "�"]

Note: the non-BMP characters, i.e. will split into "�", "�", "�"

I want some approach that can split the words correctly, as such:

["今", "天", "很", "美", ",", "Peter", "和", "Jane", ... "St.", " ", "Thomas", " ", ""]

Any idea? Use regex in .split(/ ??? /)? But how to write the regex "properly" (able to handle multiple languages)?

Boo Yan Jiong
  • 2,491
  • 5
  • 17
  • 31

0 Answers0