1

I am validating username using regex in JS. However, it was giving me an error for the other country character. So I found regex for the other country character.

if(/^[a-zA-Z0-9äöüÄÖÜß\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF\x30A0-\x30FFñáéíóúü\p{Han}\u1100-\u11FF|\u3130-\u318F|\uA960-\uA97F|\uAC00-\uD7AF|\uD7B0-\uD7FFàâäèéêëîïôœùûüÿçÀÂÄÈÉÊËÎÏÔŒÙÛÜŸÇ\u00C0-\u017F\u4E00-\u9FFF|\u2FF0-\u2FFF|\u31C0-\u31EF|\u3200-\u9FBF|\uF900-\uFAFFzàèéìòóù\u00E0\u00E8\u00E9\u00EC\u00F2\u00F3\u00F9._-]{1,160}$/i.test(text)){        
          console.log('correct word');
        } else {
          console.log('wrong word');
        }

But i only want to allow some specific country e.g

Korean: Hangul, Chosŏn'gŭl
Japanese: Hiragana, katakana (full width), kanji
German
Spanish
French
Italian
Chinese: Simplified Chinese
Russian
Portuguese.

I want i can manually remove any country character, e.g i want to remove "Simplified Chinese", but i don't what i can do in my code because i don't know which string is used for which country character in if condition. Could anyone please help?

Tariq Husain
  • 559
  • 5
  • 23
  • There is a [Unicode range RegExp generator](http://apps.timwhitlock.info/js/regex#) that can help here. For some of the languages, you can find [some regexps here](http://stackoverflow.com/questions/30798522/regular-expression-not-working-for-at-least-one-european-character/30798598#30798598) – Wiktor Stribiżew Jan 27 '16 at 07:13
  • Generally, there's no reason to restrict characteres in usernames, other than filtering out malicous code – adeneo Jan 27 '16 at 07:13
  • i tried to search there before also but i couldn't search German,Spanish,French,Italian. Mybe i am just starting with regex thats why i couldn't able to find what i am looking for @WiktorStribiżew – Tariq Husain Jan 27 '16 at 07:20
  • See my second link to one of my answers. – Wiktor Stribiżew Jan 27 '16 at 07:21
  • Thanks @WiktorStribiżew i was able to find some of them from given link. – Tariq Husain Jan 27 '16 at 07:39
  • I believe your question is rather broad, and there are several sources. [Chinese](http://stackoverflow.com/questions/21109011/javascript-unicode-string-chinese-character-but-no-punctuation), [Japanese](https://gist.github.com/terrancesnyder/1345094), [Korean](http://stackoverflow.com/a/32242707/3832970). If any of SO answers are helpful, please consider upvoting them. – Wiktor Stribiżew Jan 27 '16 at 07:51

1 Answers1

0

I am not familiar with the any of the languages you have mentioned but I can tell you how you can create RegEx for any Language.

There is a very simple method to apply all you RegEx logic(that one can apply easily in English) for any Language using Unicode.

For matching a range of Unicode Characters like all Alphabets [A-Za-z] we can use

[\u0041-\u005A] where \u0041 is Hex-Code for A and \u005A is Hex Code for Z
'matchCAPS leTTer'.match(/[\u0041-\u005A]+/g)
//output ["CAPS", "TT"]

'matchCAPS leTTer'.match(/[A-Z]+/g)
//output ["CAPS", "TT"]

In the same way we can use other Unicode characters or their equivalent Hex-Code according to their Hexadecimal Order (eg: \u0A10 to \u0A1F) provided by unicode.org

Try: [电-触] for Chinese

It will match all characters between 电 and 触 if provided by unicode.org in this order

Similarly you can add Characters for other languages all together in one regEx as

/[电-触ڀ-ڴᄀ-ᆿ]/       //combination of Chinese, Arabic, Korean

Note:

Make sure you are using Correct Range for Alphabets

Harpreet Singh
  • 2,651
  • 21
  • 31
  • but where i can find the character range for the languages that i have mentioned ? – Tariq Husain Jan 27 '16 at 07:39
  • @TariqHusain Google it as Unicode Chart - Language name ...check this http://unicode.org/charts/PDF/U1100.pdf – Harpreet Singh Jan 27 '16 at 07:40
  • yea that's the main problem i am facing i don't get which character set for Chinese and which for french and etc from your given link. – Tariq Husain Jan 27 '16 at 07:42
  • @TariqHusain, you have to google it and find range.....As i have done fr korean as "korean characters"and found this link which tells me range of Korean alphabets - http://www.omniglot.com/writing/korean.htm – Harpreet Singh Jan 27 '16 at 07:46
  • @TariqHusain...Go to third page of this link - http://unicode.org/charts/PDF/U1100.pdf. It has list of all possible Consonants/Vowels of Korean language. You can combine the range of Both Consonants and vowels to make a fine regex for korean and similarly for others as mentioned in my answer – Harpreet Singh Jan 27 '16 at 07:51