Based on this link: Is it possible to use spacy with already tokenized input?
I can get Spacy to take tokenized doc as input and process the doc further. And the code is below:
def nlp_process(self, token_tuple):
# token_tuple = ("This is a test", ['This','is','a','test'])
doc = Doc(self.nlp.vocab, words=token_tuple[1])
for name, proc in self.nlp.pipeline:
doc = proc(doc)
return doc
This works well for single input. What about if I want to process docs in batch mode by using nlp.pipe() function? Something like:
nlp_docs = self.nlp.pipe(texts)
The pipe takes a list of raw text. How to deal with this situation?