85

How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have

[A-Za-z]

tchrist
  • 78,834
  • 30
  • 123
  • 180
Greg Finzer
  • 6,714
  • 21
  • 80
  • 125
  • 1
    Look in this question: [Regex and unicode](https://stackoverflow.com/questions/14389/regex-and-unicode) –  Nov 11 '09 at 17:03
  • 4
    Here it is: [А-Яа-я] – Alex Erygin Jun 30 '18 at 17:44
  • 14
    @AlexErygin For Russian only characters it is: **`[ЁёА-я]`** (where `А` is Russian). The unicode code for Russian `а` is right after `Я`, so you don't need 2 ranges. The unicode codes for `Ёё` is not between `А-я` so you need to specify Ёё separately. – CITBL Sep 11 '18 at 10:58

11 Answers11

62

If your regex flavor supports Unicode blocks ([\p{IsCyrillic}]), you can match Cyrillic characters with:

[\p{IsCyrillic}] or [\p{Cyrillic}]

Otherwise try using:

[U+0400–U+04FF]

For PHP use:

[\x{0400}-\x{04FF}]

Explanation:

[\p{IsCyrillic}]

Match a character from the Unicode block "Cyrillic" (U+0400–U+04FF) «[\p{IsCyrillic}]»

Note:

Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] .

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
45

It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
28

To match only Russian Cyrillic characters use:

[\u0401\u0451\u0410-\u044f]

which is the equivalent of:

[ЁёА-я]

where А is Cyrillic, not Latin. (Despite looking the same they have different codes)

\p{IsCyrillic}, \p{Cyrillic}, [\u0400-\u04FF] which others suggested will match all variants of Cyrillic, not only Russian

CITBL
  • 1,587
  • 3
  • 21
  • 36
11

If you use modern PHP version - just:

preg_match("/^[\p{L}]+$/u");

Don't forget the u flag for unicode support!

6

Regex to match cyrillic alphabets with normal(english) alphabets :

^[A-Za-z.!@?#"$%&:;() *\+,\/;\-=[\\\]\^_{|}<>\u0400-\u04FF]*$

It matches special chars,cyrillic alphabets,english alphabets.

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
Dipti Ghumbre
  • 77
  • 1
  • 1
  • 1
    Non-English alphabets are not normal ??? Not to mention there is only 1 English alphabet – CITBL Jun 01 '21 at 11:10
5

Various regex dialects use [:alpha:] for any alphanumeric character in the current locale. (You may need to put that in a character class, e.g. [[:alpha:]].)

  • This works in PostgreSQL too, but matches all national characters (so not only current locale). And you can also use `[[:lower:]]` and `[[:upper:]]` for matching specific case. E.g. replace lower case characters: `regexp_replace(firstname, '[[:lower:]]', 'a', 'g')`. – Nux Mar 10 '21 at 15:02
5

this worked for me

[a-z\u0400-\u04FF]
lili.b
  • 81
  • 1
  • 6
2

If you use Elixir:

String.match?(string, ~r/^\p{Cyrillic}*$/u)

You need to add the u flag for unicode support.

Marvin Rabe
  • 4,141
  • 3
  • 25
  • 43
  • 1
    Attention, the above regex returns `true` for empty String: `String.match?("", ~r/^\p{Cyrillic}*$/u)` => `true`. You should change `*` modifier for `+` to fix that. – belgoros Feb 28 '19 at 15:17
1

You can use the first and the last letter. For example in Bulgarian:

[А-я]+
0

For modern PHP (source):

$string = 'тест тест Тест Обязателльно Stackoverflow >!<';
var_dump(preg_replace('/[\x{0410}-\x{042F}]+.*[\x{0410}-\x{042F}]+/iu', '', $string));
Robert Sinclair
  • 4,550
  • 2
  • 44
  • 46
-2

In Java to match Cyrillic letters and space use the following pattern

^[\p{InCyrillic}\s]+$