6

I am new to web develop and now developing an internal use website by js for different languages user, Its worked great for detect English character input and Chinese character input . and I want further more to detect traditional and simplified Chinese . Any idea to done this task ?

below is my code to detect chinese and english

  if (string.match(/^[A-Za-z]*$/)) {
   //....English
  } else if (string.match(/[\u3400-\u9FBF]/)) {
   //....Chinese
  } else {

  }
Stephen Chen
  • 3,027
  • 2
  • 27
  • 39
  • Use Unicode Characters (eg. \u3400) in "string.match" for traditional Chinese detection and another condition to detect simplified Chinese. Please update here if it doesn't work. –  Sep 01 '16 at 04:43
  • This link may help you. [Chinese Unicode Table - StackOverflow](http://stackoverflow.com/questions/4596576/simplified-chinese-unicode-table) –  Sep 01 '16 at 04:48
  • thx for the resource , I will tried later ... – Stephen Chen Sep 01 '16 at 06:14

1 Answers1

7

I've built a library traditional-or-simplified to detect if a string contains a majority of Traditional or Simplified Chinese characters by comparing the number of Simplified/Traditional characters that appear in an input string.

var TradOrSimp = require('traditional-or-simplified');

// Detect if a string contains Simplified Chinese 
TradOrSimp.isSimplified('无需注册或设置')
// True 

// Detect if a string contains Traditional Chinese 
TradOrSimp.isTraditional('無需帳戶或註冊。')
// True 

// Detect if a string contains Traditional or Simplified Chinese characters 
TradOrSimp.detect('無需帳戶或註冊。')

/* 
{ inputLength: 8, // Length of input string 
  simplifiedCharacters: 0, // Count of Simplified Chinese characters 
  traditionalCharacters: 4, // Count of Traditional Chinese characters 
  detectedCharacters: 'traditional', // Detected character set 
  detectionRate: 1 } // Ratio of majority/minority character sets */
nickdrewe
  • 86
  • 1
  • 3
  • thank you !! Are you going to try detect more languages :) – Stephen Chen Jul 30 '17 at 14:21
  • 1
    @StephenChen I'm using [Franc](https://github.com/wooorm/franc/tree/master/packages/franc) for language detection. But it doesn't differentiate Traditional and Simplified Chinese, hence the library above. – nickdrewe Aug 02 '17 at 01:53
  • For better use, maybe try sent a PR to Fran. – Stephen Chen Aug 08 '17 at 07:13
  • Interesting @nickdrewe, so is the only way to detect based on the presence telltale characters? I found this question searching for the same thing, and was hoping for some kind of unicode range, but I guess it's not that clean. Also, I see in your library that there are the same number of simplified and traditional characters - is that supposed to be so? My understanding was that it was *not* a 1-1 mapping - that some simplified characters translate to different traditional ones based on context. – antun Jun 27 '20 at 04:12