I'm developing a chatbot with the chatterbot library. The chatbot is in my native language --> Slovene, which has a lot of strange characters (for example: š, č, ž). I'm using python 2.7.
When I try to train the bot, the library has trouble with the characters mentioned above. For example, when I run the following code:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam še dopusta?",
"Letos imate še 19 dni dopusta.",
])
it throws the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 12: invalid start byte
I added the # -*- coding: utf-8 -*-
line to the top of my file, I also changed the encoding of all used files via my editor (Sublime text 3) to utf-8, I changed the system default encoding with the following code:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
The strings are of type unicode.
When I try to get a response, with these strange characters, it works, it has no issues with them. For example, running the following code in the same execution as the above training code(when I change 'š' to 's' and 'č' to 'c', in the train strings), throws no errors:
chatBot.set_trainer(ListTrainer)
chatBot.train([
"Koliko imam se dopusta?",
"Letos imate se 19 dni dopusta.",
])
chatBot.get_response("Koliko imam še dopusta?")
I can't find a solution to this issue. Any suggestions? Thanks loads in advance. :)
EDIT: I used from __future__ import unicode_literals
, to make strings of type unicode. I also checked if they really were unicode with the method type(myString)
I would also like to paste this link.
EDIT 2: @MallikarjunaraoKosuri - s code works, but in my case, I had one more thing inside the chatbot instance intialization, which is the following:
chatBot = ChatBot(
'Test',
trainer='chatterbot.trainers.ListTrainer',
storage_adapter='chatterbot.storage.JsonFileStorageAdapter'
)
This is the cause of my error. The json storage file the chatbot creates, is created in my local encoding and not in utf-8. It seems the default storage (.sqlite3), doesn't have this issue, so for now I'll just avoid the json storage. But I am still interested in finding a solution to this error.