Can you please tell me what I am missing in the code below? I am trying to use some functions defined (at the bottom of the post) that can help me to remove stopwords, form bigrams and doing some lemmatisation. The language is Italian. I am using space for doing so.
!python -m spacy download it_core_news_sm
import spacy
nlp = spacy.load("it_core_news_sm")
data_words_nostops = remove_stopwords(tok_text_list)
# Form Bigrams
data_words_bigrams = make_bigrams(data_words_nostops)
nlp = spacy.load('it', disable=['parser', 'ner'])
# Do lemmatization keeping only noun, adj, vb, adv
data_lemmatized = lemmatization(data_words_bigrams, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV'])
print(data_lemmatized[:1])
where
tok_text_list= [['papa',
',',
"l'aspirante",
'pilota',
'anni',
'morto',
'fiume',
'tevere',
'seguito',
"all'incidente",
"l'aereo",
'.',
'spiaggia',
'campo',
'mare',
'é',
'vietata',
'disabili',
'.'], [...]]
The error that I am getting is:
OSError Traceback (most recent call last)
<ipython-input-216-775b3f412d6f> in <module>
---> 14 nlp = spacy.load('it', disable=['parser', 'ner'])
15
16 # Do lemmatization keeping only noun, adj, vb, adv
OSError: [E050] Can't find model 'it'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Maybe I forgot to include something in the code or to download some other file. I also tried to rerun everything as suggested here: Loading the spacy german language model into a jupyter notebook. I am using Jupiter Notebook.
Thanks