I have a file which is generated by this command:
fab -f vocab/fabfile build_vocab:<lang>,<corpus_files_root>
.
This command is a part of a guide of spaCy, and is obtained from here. Since this command works with fabric
which in turn works with python 2, the output has a lot of Persian strings represented with their unicode codes, not the actual string, the string itself. In other words, I have the following:
2 1 u'\u0641\u0632\u0646\u062f\u0627\u0646'
1 1 u'\u200c\u0645\u0648\u0647\u0627\u06cc'
2 1 u'\u0627\u0641\u0646\u0647'
.
.
.
insted of this one:
2 1 u'فزندان'
1 1 u'موهای'
2 1 u'افنه'
.
.
.
As the next part of the process, run by the above-mentioned fabric ...
command, it tries to read this file and compare it with the word in its actual form. So I think I need to convert the string represented in Unicode to the actual form. Is there any way to do so?