Python's tools for dealing with Unicode feature the unicodedata module - which have some tools to deal with this.
Testing things on a "character by character" basis, and trying to check for all possible combinations of accented latin letters in an "if_esque" structure not only look and feels bad: it is a bad approach.
One of the most basic tools for dealing with unicode is getting the character names itself - all Latin letters do have "LATIN" in their name, and all cyrillic characters do have "CYRILLIC" in their name.
In [1]: import unicodedata
In [2]: unicodedata.name("ã")
Out[2]: 'LATIN SMALL LETTER A WITH TILDE'
In [3]: unicodedata.name("ы")
Out[3]: 'CYRILLIC SMALL LETTER YERU'
Your strategy will vary if you want to keep whitespace, digits, and so on - but basically, if you want to remove all non cyrillic characters:
In [7]: s = 'A ligeira raposa marrom ataca o cão preguiçoso Быстрая коричневая лиса прыгает через ленивую собаку +='
...:
In [8]: print(''.join(char for char in s if 'CYRILLIC' in unicodedata.name(char)))
Быстраякоричневаялисапрыгаетчерезленивуюсобаку
And conversely, if you want to keep everything and remove all latin characters:
In [9]: print(''.join(char for char in s if 'LATIN' not in unicodedata.name(char)))
Быстрая коричневая лиса прыгает через ленивую собаку +=
With that information alone, it is possible to achieve your objective - although there is more unicode metadata in characters than their name, like their "category". If you need to
refine your filters, unicodedata.category(...)
will return a two-character code
for a character category. All letters (regardless of alphabet) will have "L" in
the first position of that code, for example:
In [10]: unicodedata.category("a")
Out[10]: 'Ll'
In [11]: unicodedata.category("ã")
Out[11]: 'Ll'
In [12]: unicodedata.category("л")
Out[12]: 'Ll'
In [13]: unicodedata.category("A")
Out[13]: 'Lu'
In [14]: unicodedata.category("2")
Out[14]: 'Nd'