How to match Cyrillic characters with a regular expression

Question

How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have

[A-Za-z]

Look in this question: [Regex and unicode](https://stackoverflow.com/questions/14389/regex-and-unicode) — , Nov 11 '09 at 17:03
@AlexErygin For Russian only characters it is: **`[ЁёА-я]`** (where `А` is Russian). The unicode code for Russian `а` is right after `Я`, so you don't need 2 ranges. The unicode codes for `Ёё` is not between `А-я` so you need to specify Ёё separately. — CITBL, Sep 11 '18 at 10:58

Pedro Lobito · Answer 1 · 2022-07-17T23:10:51.543

62

If your regex flavor supports Unicode blocks ([\p{IsCyrillic}]), you can match Cyrillic characters with:

[\p{IsCyrillic}] or [\p{Cyrillic}]

Otherwise try using:

[U+0400–U+04FF]

For PHP use:

[\x{0400}-\x{04FF}]

Explanation:

[\p{IsCyrillic}]

Match a character from the Unicode block "Cyrillic" (U+0400–U+04FF) «[\p{IsCyrillic}]»

Note:

Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] .

edited Jul 17 '22 at 23:10

answered Jun 14 '11 at 10:50

Pedro Lobito

94,083
31
258
268

This thread explains that http://stackoverflow.com/questions/7926514/matching-cyrilic-symbols-in-c-sharp – Dmitry Pavlov Jan 16 '13 at 10:12
@black Which programming language are you using? – Pedro Lobito Sep 02 '19 at 23:00
I am using PHP. – Black Sep 03 '19 at 06:15
For `php` try using `[\x{0400}-\x{04FF}]` instead. https://regex101.com/r/zcRenT/1 – Pedro Lobito Sep 03 '19 at 06:54
PHP supports `\p{Cyrillic}`, you just need to make sure to add a u flag onto the regex – donatJ Sep 01 '23 at 05:54

score 45 · Accepted Answer · answered Nov 11 '09 at 19:57

45

It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

answered Nov 11 '09 at 19:57

Tim Pietzcker

328,213
58
503
561

1

How about doing this in Java? – IgorGanapolsky Dec 16 '15 at 19:18
1

This will match **any** Cyrillic characters including those not present in the Russian alphabet (Greg was asking about Russian Cyrillic) – CITBL Sep 10 '18 at 11:06
1

In Javascript, you need to also add the flag 'u'. See https://javascript.info/regexp-unicode. – Orlin Apr 30 '20 at 10:15
Note: p{L} from JavaScript doesn't work in Safari at the moment. – Adam Šipický Mar 08 '21 at 07:30

CITBL · Answer 3 · 2019-12-09T07:41:21.213

28

To match only Russian Cyrillic characters use:

[\u0401\u0451\u0410-\u044f]

which is the equivalent of:

[ЁёА-я]

where А is Cyrillic, not Latin. (Despite looking the same they have different codes)

\p{IsCyrillic}, \p{Cyrillic}, [\u0400-\u04FF] which others suggested will match all variants of Cyrillic, not only Russian

edited Dec 09 '19 at 07:41

answered Sep 10 '18 at 11:48

CITBL

1,587
3
21
36

score 11 · Answer 4 · answered Jul 29 '14 at 13:31

11

If you use modern PHP version - just:

preg_match("/^[\p{L}]+$/u");

Don't forget the u flag for unicode support!

answered Jul 29 '14 at 13:31

Олег Всильдеревьев

903
10
11

2

Can you explain your regex please? I tried it with `Бори́с` but it does not match, so your regex does not work. – Black Sep 02 '19 at 08:19
It's easy, please look at: https://www.php.net/manual/en/regexp.reference.unicode.php "L" means any letter. So the "и́" symbol should be in some other group! Try to find it. – Олег Всильдеревьев Sep 03 '19 at 14:16

score 6 · Answer 5 · edited Jan 30 '17 at 09:54

6

Regex to match cyrillic alphabets with normal(english) alphabets :

^[A-Za-z.!@?#"$%&:;() *\+,\/;\-=[\\\]\^_{|}<>\u0400-\u04FF]*$

It matches special chars,cyrillic alphabets,english alphabets.

edited Jan 30 '17 at 09:54

Moinuddin Quadri

46,825
13
96
126

answered Jan 30 '17 at 09:53

Dipti Ghumbre

77
1
1

1

Non-English alphabets are not normal ??? Not to mention there is only 1 English alphabet – CITBL Jun 01 '21 at 11:10

score 5 · Answer 6 · answered Nov 11 '09 at 17:22

5

Various regex dialects use [:alpha:] for any alphanumeric character in the current locale. (You may need to put that in a character class, e.g. [[:alpha:]].)

answered Nov 11 '09 at 17:22

This works in PostgreSQL too, but matches all national characters (so not only current locale). And you can also use `[[:lower:]]` and `[[:upper:]]` for matching specific case. E.g. replace lower case characters: `regexp_replace(firstname, '[[:lower:]]', 'a', 'g')`. – Nux Mar 10 '21 at 15:02

lili.b · Answer 7 · 2018-05-25T08:03:27.053

5

this worked for me

[a-z\u0400-\u04FF]

edited May 25 '18 at 08:03

answered May 25 '18 at 07:58

lili.b

81
1
6

2

to match ONLY Cyrillic characters use `[\u0400-\u04FF]` – Boykodev May 30 '18 at 11:14

score 2 · Answer 8 · answered Jan 12 '19 at 12:48

2

If you use Elixir:

String.match?(string, ~r/^\p{Cyrillic}*$/u)

You need to add the u flag for unicode support.

answered Jan 12 '19 at 12:48

Marvin Rabe

4,141
3
25
43

1

Attention, the above regex returns `true` for empty String: `String.match?("", ~r/^\p{Cyrillic}*$/u)` => `true`. You should change `*` modifier for `+` to fix that. – belgoros Feb 28 '19 at 15:17

score 1 · Answer 9 · answered Feb 07 '23 at 21:51

1

You can use the first and the last letter. For example in Bulgarian:

[А-я]+

answered Feb 07 '23 at 21:51

Diyan Kalaydzhiev

71
1
3

score 0 · Answer 10 · answered May 08 '22 at 19:11

0

For modern PHP (source):

$string = 'тест тест Тест Обязателльно Stackoverflow >!<';
var_dump(preg_replace('/[\x{0410}-\x{042F}]+.*[\x{0410}-\x{042F}]+/iu', '', $string));

answered May 08 '22 at 19:11

Robert Sinclair

4,550
2
44
46

score -2 · Answer 11 · answered Aug 07 '19 at 10:00

-2

In Java to match Cyrillic letters and space use the following pattern

^[\p{InCyrillic}\s]+$

answered Aug 07 '19 at 10:00

Tony Thanuvelil

1
1

How to match Cyrillic characters with a regular expression

11 Answers11

Linked

Related