9

How do I limit the number of CPUs used by Spacy?

I want to extract parts-of-speech and named entities from a large set of sentences. Because of limitations regarding RAM, I first use the Python NLTK to parse my documents into sentences. I then iterate over my sentences and use nlp.pipe() to do the extractions. However, when I do this, Spacy consumes the whole of my computer; Spacy uses every available CPU. Such is not nice because my computer is shared. How can I limit the number of CPUs used by Spacy? Here is my code to date:

# require
from nltk import *
import spacy

# initialize
file = './walden.txt'
nlp  = spacy.load( 'en' )

# slurp up the given file
handle = open( file, 'r' )
text   = handle.read()

# parse the text into sentences, and process each one
sentences = sent_tokenize( text )
for sentence in nlp.pipe( sentences, n_threads=1 ) :

  # process each token
  for token in sentence : print( "\t".join( [ token.text, token.lemma_, token.tag_ ] ) )

# done
quit()
TylerH
  • 20,799
  • 66
  • 75
  • 101
ericleasemorgan
  • 213
  • 1
  • 11
  • Is it possible that you are doing character tokenization on accident? Check by printing out your sentence – Nate Raw May 26 '18 at 09:20
  • The problem sounds related to running Python programs in general on one core at most: [How to stop Python from using more than one core](https://stackoverflow.com/questions/10427900/stop-python-from-using-more-than-one-cpu). The answers seem to point at setting the priority ('niceness') for the process at the level of the operating system. – Ben Companjen May 26 '18 at 10:39
  • @NateRaw, I thought I might be doing character tokenization on accident, but that is not true. Good guess though. – ericleasemorgan May 26 '18 at 11:59

1 Answers1

4

My answer to my own question is, "Call the operating system and employ a Linux utility named taskset."

# limit ourselves is a few processors only
os.system( "taskset -pc 0-1 %d > /dev/null" % os.getpid() )

This particular solution limits the running process to cores #1 and #2. This solution is good enough for me.

ericleasemorgan
  • 213
  • 1
  • 11