I am trying to tokenize text. For it to work, I was trying Polygot and installed the way, has been mentioned in the documentation. After installing it, I have been trying to make the simple script run:
import polyglot
from polyglot.text import Text, Word
text = Text("\"သမၼတဦးဝင္းျမင့္ရဲ႕ ျခင္းခတ္ကစားဟန္\"\n\nႏိုင္ငံေတာ္သမၼတ ဦးဝင္းျမင့္ ျမန္မာ့ရိုးရာဝိုင္းျခင္းခတ္ ေနတဲ့ပုံေတြ ဟာ ဒီကေန႕ ညေနပိုင္းမွာထြက္ရိွလာပါတယ္။\n\nဒီကေန႕ ညေနပိုင္းမွာ သမၼတအိမ္ေတာ္ဝင္းအတြင္းမွာ သမၼတဟာ သူရဲ႕မိတ္ေဆြေတြနဲ႕ ျခင္းခတ္ခဲ့တာလို႕ သိရပါတယ္။\n\nသမၼတနဲ႕ဝိုင္းျခင္းခတ္တဲ့သူေတြထဲမွာေတာ့ အမ်ိဳးသားလႊတ္ေတာ္ကိုယ္စားလွယ္ ဦးေက်ာ္သီဟ ၊ အစိုးရ ရဲ႕ၿငိမ္းခ်မး္ေရးေကာ္မရွင္အဖြဲ႕ဝင္ ဦးေအာင္စိုးတို႕ပါဝင္ၾကပါတယ္။\n\nသမၼတ ဦးဝင္းျမင့္ဟာ သမၼတတာဝန္မထမ္းေဆာင္မီ လႊတ္ေတာ္ကိုယ္စားလွယ္အျဖစ္ ေနျပည္ေတာ္က စည္ပင္ဧည္႕ရိပ္သာဝင္းအတြင္း ေနထိုင္စဥ္ကတည္းက အမ်ိဳးသားဒီမိုကေရစီအဖြဲ႕ခ်ဳပ္ ပါတီဝင္လႊတ္ေတာ္ကိုယ္စားလွယ္အခ်ိဳ႕နဲ႕ ညေနပိုင္းေတြမွာ ျခင္းခတ္ေလ့ရိွပါတယ္။\n\nကိုယ္လက္လႈပ္ရွားအားကစားအျဖစ္ ျခင္းခတ္ေလ့ရိွတဲ့သူေတြထဲမွာ ေတာ့ လက္ရိွ မႏၱေလးတိုင္းဝန္ႀကီးခ်ဳပ္ ေဒါက္တာေဇာ္ျမင့္ေမာင္ ၊ ဧရာဝတီတိုင္းဝန္ႀကီးခ်ဳပ္ေဟာင္း မန္းေဂ်ာ္နီတို႕လည္း ပါဝင္ေလ့ရိွပါတယ္။ ")
print(text.words)
but have been getting an error saying:
Traceback (most recent call last):
File "tkn.py", line 2, in <module>
from polyglot.text import Text, Word
File "/home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/polyglot/text.py", line 9, in <module>
from polyglot.detect import Detector, Language
File "/home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/polyglot/detect/__init__.py", line 1, in <module>
from .base import Detector, Language
File "/home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/polyglot/detect/base.py", line 11, in <module>
from icu import Locale
ModuleNotFoundError: No module named 'icu'
To resolve this, I tried multiple steps mentioned here and few answers as given here but it has not solved the problem. I am working with Ubuntu
.
Also, if I try to install pip install pyicu
as some of the posts suggest, I get an error while trying to install it saying Failed building wheel for pyicu
and error: command 'gcc' failed with exit status 1
.
I am not sure how to proceed from here. How could I resolve the related error and make the script work?
I also downloaded and installed icu (building it manually for Ubuntu)but no help.
curl -LO http://download.icu-project.org/files/icu4c/63.1/icu4c-63_1-src.tgz
tar xzvf icu4c-63_1-src.tgz
cd icu/source
chmod +x runConfigureICU configure install-sh
./runConfigureICU Linux
make
sudo make install
sudo cp -r common/unicode /usr/local/include/