How do I limit the number of CPUs used by Spacy?
I want to extract parts-of-speech and named entities from a large set of sentences. Because of limitations regarding RAM, I first use the Python NLTK to parse my documents into sentences. I then iterate over my sentences and use nlp.pipe() to do the extractions. However, when I do this, Spacy consumes the whole of my computer; Spacy uses every available CPU. Such is not nice because my computer is shared. How can I limit the number of CPUs used by Spacy? Here is my code to date:
# require
from nltk import *
import spacy
# initialize
file = './walden.txt'
nlp = spacy.load( 'en' )
# slurp up the given file
handle = open( file, 'r' )
text = handle.read()
# parse the text into sentences, and process each one
sentences = sent_tokenize( text )
for sentence in nlp.pipe( sentences, n_threads=1 ) :
# process each token
for token in sentence : print( "\t".join( [ token.text, token.lemma_, token.tag_ ] ) )
# done
quit()