Background
I would like to classify all the three phrases as Chinese, 'zh'
using fastText.
["Ni hao!", '你好!', 'ni hao!']
However the trained model looks not applicable for the semantic classification.
Is there any idea to do the same task with different ways?
Output
[('zh', 0.9305274486541748)]
[('eo', 0.9765485525131226)]
[('hr', 0.6364055275917053)]
Code
sample.py
from fasttext import load_model
model = load_model("lid.176.bin")
speech_texts = ["Ni hao!", '你好!', 'ni hao!']
def categolize_func(texts, model, k):
for i in range(len(texts)):
text = texts[0]
label, prob = model.predict(text, k)
return list(zip([l.replace("__label__", "") for l in label], prob))
print(categolize_func(speech_texts, model, 1))