4

I am building a JavaScript array depending on the input of the user. The array is building fine but if the user enters Chinese symbols it crashes. I'm assuming that it is if the user enters a chinese " or a , or a '. I have the program replacing the English versions of this but i don't know how to replace the Chinese versions of it.

Can anyone help?

Thanks to all for their input

Makoto
  • 104,088
  • 27
  • 192
  • 230
Wesley Skeen
  • 7,977
  • 13
  • 42
  • 56
  • Do you mean you are building the array server side? Which language/platform are you using? Most web envirnoments provide functions to build JavaScript arrays/objects/strings with the correct escaping. – RoToRa Oct 25 '11 at 12:44
  • Yeah i'm building it server side and i'm using c sharp. – Wesley Skeen Oct 25 '11 at 13:28

5 Answers5

4

From What's the complete range for Chinese characters in Unicode?, the CJK unicode ranges are:

  • 4E00-9FFF (common)
  • 3400-4DFF (rare)
  • F900-FAFF (compatability - Duplicates, unifiable variants, corporate characters)
  • 20000-2A6DF (rare, historic)
  • 2F800-2FA1F (compatability - supplement)

Because JS strings only support UCS-2, which max out at FFFF, the last two ranges probably aren't of great interest. Thus, if you're building a JS string should be able to filter out chinese characters using something like:

replace(/[\u4e00-\u9fff\u3400-\u4dff\uf900-\ufaff]/g, '')
Community
  • 1
  • 1
broofa
  • 37,461
  • 11
  • 73
  • 73
2

You need to use unicode replacer. I think it will help you: http://answers.yahoo.com/question/index?qid=20080528045141AAJ0AIS

Sergey
  • 7,933
  • 16
  • 49
  • 77
1

Building on broofa's answer:

If you just want to find and replace the Chinese punctuation like " or " or a . then you'll want to use unicode characters in the range of FF00-FFEF. Here is a PDF from Unicode showing them: http://unicode.org/charts/PDF/UFF00.pdf
I think you'd want at least replace these: FF01, FF02, FF07, FF0C, FF0E, FF1F, and FF61. That should be the major Chinese punctuation marks. You can use broofa's replace function.

tsroten
  • 2,534
  • 1
  • 14
  • 17
1

.Net provides JavaScriptSerializer and it's method Serialize, which creates correctly escaped JavaScript literals (although I personally haven't used it with Chinese characters, but there is no reason it shouldn't work).

RoToRa
  • 37,635
  • 12
  • 69
  • 105
0

Not asked by the question, but adding \u30a0-\u30ff\u3040-\u309f you can also take out the Hiragana and Katakana from Japanese:

replace(/[\u4e00-\u9fff\u3400-\u4dff\uf900-\ufaff\u30a0-\u30ff\u3040-\u309f]/g, '')
  1. https://regex101.com/r/4Aw9Q8/1
  2. https://en.wikipedia.org/wiki/Katakana_(Unicode_block)
  3. https://en.wikipedia.org/wiki/Hiragana_(Unicode_block)
Evandro Coan
  • 8,560
  • 11
  • 83
  • 144