JavaScript to replace Chinese characters

Question

I am building a JavaScript array depending on the input of the user. The array is building fine but if the user enters Chinese symbols it crashes. I'm assuming that it is if the user enters a chinese " or a , or a '. I have the program replacing the English versions of this but i don't know how to replace the Chinese versions of it.

Can anyone help?

Thanks to all for their input

Do you mean you are building the array server side? Which language/platform are you using? Most web envirnoments provide functions to build JavaScript arrays/objects/strings with the correct escaping. — RoToRa, Oct 25 '11 at 12:44

score 4 · Accepted Answer · edited May 23 '17 at 10:34

From What's the complete range for Chinese characters in Unicode?, the CJK unicode ranges are:

4E00-9FFF (common)
3400-4DFF (rare)
F900-FAFF (compatability - Duplicates, unifiable variants, corporate characters)
20000-2A6DF (rare, historic)
2F800-2FA1F (compatability - supplement)

Because JS strings only support UCS-2, which max out at FFFF, the last two ranges probably aren't of great interest. Thus, if you're building a JS string should be able to filter out chinese characters using something like:

replace(/[\u4e00-\u9fff\u3400-\u4dff\uf900-\ufaff]/g, '')

score 2 · Answer 2 · answered Oct 25 '11 at 11:36

2

You need to use unicode replacer. I think it will help you: http://answers.yahoo.com/question/index?qid=20080528045141AAJ0AIS

answered Oct 25 '11 at 11:36

Sergey

7,933
16
49
77

Yahoo answers link died. What is this replacer? – Evandro Coan Apr 25 '22 at 00:35

score 1 · Answer 3 · answered Apr 26 '12 at 21:40

Building on broofa's answer:

If you just want to find and replace the Chinese punctuation like " or " or a . then you'll want to use unicode characters in the range of FF00-FFEF. Here is a PDF from Unicode showing them: http://unicode.org/charts/PDF/UFF00.pdf
I think you'd want at least replace these: FF01, FF02, FF07, FF0C, FF0E, FF1F, and FF61. That should be the major Chinese punctuation marks. You can use broofa's replace function.

score 1 · Answer 4 · answered Oct 25 '11 at 13:36

1

.Net provides JavaScriptSerializer and it's method Serialize, which creates correctly escaped JavaScript literals (although I personally haven't used it with Chinese characters, but there is no reason it shouldn't work).

answered Oct 25 '11 at 13:36

RoToRa

37,635
12
69
105

score 0 · Answer 5 · answered Apr 25 '22 at 00:51

Not asked by the question, but adding \u30a0-\u30ff\u3040-\u309f you can also take out the Hiragana and Katakana from Japanese:

replace(/[\u4e00-\u9fff\u3400-\u4dff\uf900-\ufaff\u30a0-\u30ff\u3040-\u309f]/g, '')

JavaScript to replace Chinese characters

5 Answers5

Linked