In a few Slavic languages, written in both Latin and Cyrillic, rising and falling accent marks are used only for disambiguation in context, ie inconsistently, only on vowels.
I would like a Python code or lib remove to acute and grave accents from vowels, while preserving other diacritics.
For example:
жѝзнеспосо́бный -> жизнеспособный
сè се фаќа -> се се фаќа
kȕćica -> kućica
If it's any help, here is a complete list of all the actual (ie unaccented) letters in Cyrillic alphabets for Slavic languages, including those with diacritics:
абвгдежзиклмнпорстуфхцшєґіїёыіўщъьюяйјњљџђћз́с́ќѓѕ
Note:
їёыіўй are vowels that should keep their diacritics even when acute and grave accent marks are stripped away. But it is very rare or perhaps impossible, we can ignore that case.
з́с́ќѓ are consonants, like Latin ćǵśź. They should keep their acute accent marks - they will not have any added for pronunciation or disambiguation purposes.
In the alphabets in which precise formal mappings are official, the Cyrillic equivalent of a Latin consonant with an acute accent will not necessarily have an acute accent. (Perhaps it is helpful.)
Double acute and double grave are a low priority.
Background reading on these characters:
https://en.wikipedia.org/wiki/I_with_grave_(Cyrillic)#East_Slavic_languages
https://en.wikipedia.org/wiki/Shtokavian#Accentuation
https://en.wikipedia.org/wiki/Pitch_accent#Serbo-Croatian
https://en.wikipedia.org/wiki/Bulgarian_alphabet#.D0.8D
https://en.wikipedia.org/wiki/Macedonian_alphabet#Accented_letters
Similar questions:
Removing accents/diacritics from string while preserving other special chars (tried mb_chars.normalize and iconv)
How to remove accent in Python 3.5 and get a string with unicodedata or other solutions?