I want to use TreeTagger module to tag POS-information on the raw corpus using Google Colab.
I installed the module followin instructions found in How to use TreeTagger in Google Colab?.
%%bash
mkdir treetagger
cd treetagger
- Download the tagger package for your system (PC-Linux, Mac OS-X, ARM64, ARMHF, ARM-Android, PPC64le-Linux).
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.4.tar.gz
tar -xzvf tree-tagger-linux-3.2.4.tar.gz
- Download the tagging scripts into the same directory.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/tagger-scripts.tar.gz
gunzip tagger-scripts.tar.gz
- Download the installation script install-tagger.sh.
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/install-tagger.sh
- Download the parameter files for the languages you want to process.
- list of all files (parameter files) https://cis.lmu.de/~schmid/tools/TreeTagger/#parfiles
wget https://cis.lmu.de/~schmid/tools/TreeTagger/data/italian.par.gz
sh install-tagger.sh
cd ..
sudo pip install treetaggerwrapper
In particular, I have downloaded italian.par.gz because I am working on Italia tweets.
However when i try to use treetaggerwrapper on an italian sentence
it_string = 'Che bella giornata per fare una domanda su stackoverflow
tagger = treetaggerwrapper.TreeTagger(TAGLANG="it", TAGDIR='treetagger/')
tags = tagger.tag_text(it_string)
tuples = treetaggerwrapper.make_tags(tags)
print(it_string)
pprint.pprint(tags)
pprint.pprint(tuples))
I get the following WARNING: WARNING:TreeTagger:Abbreviation file not found: italian-abbreviations-utf8 WARNING:TreeTagger:Processing without abbreviations file.
Does anyone knows what does it means and how to fix it?