I have a bit of problem with using edit_distance()
in the following example.
I need to print words from the languages mentioned in the languages list in 5 columns, which is not a problem. I have done that:
from nltk.corpus import swadesh
from nltk.metrics import *
from transliterate import translit
languages = ['be', 'bg', 'bs', 'ru', 'cs']
for lang in languages:
print('{:10}'.format(lang),end='')
print()
for i in range(len(swadesh.words('be'))):
for lang in languages:
print('{:10}'.format(swadesh.words(lang)[i].split(',')[0]),end='')
print()
This parts works as it is suppose to work. Now I need to measure the Levensthein string edit distance between words from 'be' langauge and the equivalent of this word in other languages. And the distance should appear after each word in the brackets. So it should look like, for example:
tamto(0) acela(5) oni(5) то(3)
What would you suggest to be the best idea to measure it? I was thinking about crating dictionaries:
for i in languages:
words = swadesh.words(i)
d[i] = words
print(d)
And then calculate edit distance somehow, but I cannot execute this. Especially beacue one of the languages - Russian has different script which means that I have to uste translit (correct me if I am wrong, this is what I found online). Do you have any tips how to go about it? I am new to programming so maybe it is a simple question for you, but I am still trying to figure out my way around everything in nltk. Thank you in advance!