Background: I am coding a word-wrapping function for any string in HTML canvas. I hope to create a function that works for all languages.
For English, maybe .split(" ")
is enough, but this does not work for other language. For example, mixed language:
今天很美,Peter和Jane去了St. Thomas // (Today is beautiful, Peter and Jane go to St. Thomas 123)
If uses split("")
, then:
["今", "天", "很", "美", ",", "P", "e", "t", "e", ... "�", "�", "�"]
Note: the non-BMP characters, i.e. will split into "�", "�", "�"
I want some approach that can split the words correctly, as such:
["今", "天", "很", "美", ",", "Peter", "和", "Jane", ... "St.", " ", "Thomas", " ", ""]
Any idea? Use regex in .split(/ ??? /)
? But how to write the regex "properly" (able to handle multiple languages)?