21

How to check whether a string contains Cyrillic characters?

E.g.

>>> has_cyrillic('Hello, world!')
False
>>> has_cyrillic('Привет, world!')
True
smci
  • 32,567
  • 20
  • 113
  • 146
Max Malysh
  • 29,384
  • 19
  • 111
  • 115

4 Answers4

32

You can use a regular expression to check if a string contains characters in the а-я, А-Я range:

import re 

def has_cyrillic(text):
    return bool(re.search('[а-яА-Я]', text))

Alternatively, you can match the whole Cyrillic script range:

def has_cyrillic(text):
    return bool(re.search('[\u0400-\u04FF]', text))

This will also match letters of the extended Cyrillic alphabet (e.g. ё, Є, ў).

Max Malysh
  • 29,384
  • 19
  • 111
  • 115
11

regex supports Unicode properties, along with a few short forms.

>>> regex.search(r'\p{IsCyrillic}', 'Hello, world!')
>>> regex.search(r'\p{IsCyrillic}', 'Привет, world!')
<regex.Match object; span=(0, 1), match='П'>
>>> regex.search(r'\p{IsCyrillic}', 'Hello, wёrld!')
<regex.Match object; span=(8, 9), match='ё'>
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
4

Suggesting a method, faster than the discussed ones here.

Approach#1:

len("экономия3r4".encode("ascii", "ignore")) > len ("экономия3r4")

246 ns ± 7.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Would print "True" if there is a Cyrillic character

Approach#2:

Discussed in earlier post by Max

import re

def has_cyrillic(text):
    return bool(re.search('[а-яА-Я]', text))

has_cyrillic("экономия3r4")

929 ns ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Sateesh
  • 842
  • 10
  • 7
  • 1
    there must be `len("экономия3r4".encode("ascii", "ignore")) < len("экономия3r4")` to check if its cyrillic – Demetry Pascal Nov 03 '22 at 10:16
  • Approach #1 is great performance-wise, but it takes into consideration all non-ascii characters (e.g Chinese, Hindi) instead of only considering Cyrillic. – GooDeeJAY Jul 13 '23 at 13:13
-6

You could create a set containing the cyrillic letters and just check each character of the string:

cyrillic_letters = {....} # fill it with the cyrillic letters

def has_cyrillic(text):
    for c in text:
        if c in cyrillic_letters:
            return True
    return False
Niema Moshiri
  • 909
  • 5
  • 14