The current version of TensorFlow (0.6.0) runs in a single process: the same process in which you create the tf.Session
(or call tensorflow::NewSession()
, if you're using the C++ interface).
However, TensorFlow by default uses multiple threads, which could account for the high load averages you are seeing. By default TensorFlow will choose the number of threads for two threadpools based on an appropriate number for your system. Currently the default behavior is to allocate the same number of threads as cores, for both the "inter-op" threadpool (which determines the number of ops that can execute in parallel), and the "intra-op" threadpool (which determines the number of threads available for parallelizing within an op). Another source of threads is in the Python-based QueueRunner
, which starts at least one thread per queue (typically in an input pipeline).
Therefore, if you have a large number of cores, and your TensorFlow program has a lot of available parallelism (or a complex input pipeline), you can end up seeing such high load averages.