Given the accented unicode word like u'кни́га'
, I need to strip the acute (u'книга'
), and also change the accent format to u'кни+га'
, where '+'
represents the acute over the preceding letter.
What I do now is using a dictionary of acuted and not acuted symbols:
accented_list = [u'я́', u'и́', u'ы́', u'у́', u'э́', u'а́', u'е́', u'ю́', u'о́']
regular_list = [u'я', u'и', u'ы', u'у', u'э', u'а', u'е', u'ю', u'о']
accent_dict = dict(zip(accented_list, regular_list))
I want to do something like this:
def changeAccentFormat(word):
for letter in accent_dict:
if letter in word:
its_index = word.index(letter)
word = word[:its_index + 1] + u'+' + word[its_index + 1:]
return word
But of course it does not work as desired. I noticed that this code:
>>> word = u'кни́га'
>>> for letter in word:
... print letter
gives
к
н
и
´
г
а
(Well, i didn't expect the blank symbol to appear, but nevertheless). So I wonder, what is the simplest way to produce [u'к', u'н', u'и́', u'г', u'а']
? Or maybe there is some way to solve my problem without it?