1

i need a regex for all alphabets. I have an input and target text. Both of them can be belong different alphabets. I mean they can be belong chinese, latin, cyrillic and any others alphabet.

I need a regex for multi language input and multi language target text.

Is there anybody has any idea about this? How can i write this regex ?

I will use this with javascript. But i think there should be common regex for java and javascript also for this problem.

erimerturk
  • 4,230
  • 25
  • 25
  • really! i know they are not same. regex is common so i can use same regex for JAVA, JAVASCRIPT OR GROOVY ! – erimerturk Oct 13 '11 at 10:33
  • 1
    @erimerturk No you can't. See her the flavour comparisson on [regular-expressions.info](http://www.regular-expressions.info/refflavors.html). javascript is the "ECMA" column – stema Oct 13 '11 at 10:36

3 Answers3

4

If you are in Java (not in javascript!) you can use unicode properties, e.g.

\P{L} any kind of letter from any language.

See regular-expressions.info/unicode for more informations.

For Javascript:

There is a lib from XRegExp and some plugins XRegExp Unicode plugins that extends the javasript regex features. That adds support for Unicode categories, scripts, and blocks.

With those libs you would be able to use \p{L} with javascript.

See my answer to this question for a small example

Community
  • 1
  • 1
stema
  • 90,351
  • 20
  • 107
  • 135
2

Some regex engines support special character for all Unicode letters:

\p{L}

Or you can use \w - letter, digit, underscore

Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
  • i couldnt use " \p{L}" for javascript. And "\w" is not enough for example, input is "ığç" and target text is "ığçss". So this not solution for me. – erimerturk Oct 13 '11 at 11:15
0

i use "|" this character as a separator, so it is speacial for me. Key can be any character except of "|". it solve my problems thanks for answers. And it can be used with javascript, java and groovy. I tested it, worked.

var keyPrefix ="\\|[\u0000-\u007B\u007D-\uFFEF]*";
var keySuffix = "[\u0000-\u007B\u007D-\uFFEF]*\\|";
var searchkey = keyPrefix + key.toLowerCase() + keySuffix; 
erimerturk
  • 4,230
  • 25
  • 25