4

I am using torque to run some CNN-based learning using tensorflow library. (1 CPU per task)

When I run top on my server, I noticed that: load average: 677.29, 668.59, 470.

I create a session like this: sess = tf.Session()

So my question is there some place in documentation where I can read when and how many processes TensorFlow uses.

mrry
  • 125,488
  • 26
  • 399
  • 400
Farseer
  • 4,036
  • 3
  • 42
  • 61

1 Answers1

3

The current version of TensorFlow (0.6.0) runs in a single process: the same process in which you create the tf.Session (or call tensorflow::NewSession(), if you're using the C++ interface).

However, TensorFlow by default uses multiple threads, which could account for the high load averages you are seeing. By default TensorFlow will choose the number of threads for two threadpools based on an appropriate number for your system. Currently the default behavior is to allocate the same number of threads as cores, for both the "inter-op" threadpool (which determines the number of ops that can execute in parallel), and the "intra-op" threadpool (which determines the number of threads available for parallelizing within an op). Another source of threads is in the Python-based QueueRunner, which starts at least one thread per queue (typically in an input pipeline).

Therefore, if you have a large number of cores, and your TensorFlow program has a lot of available parallelism (or a complex input pipeline), you can end up seeing such high load averages.

mrry
  • 125,488
  • 26
  • 399
  • 400
  • Correct me if I wrong, but you can change this settings at tf.ConfigProto ? – Farseer Jan 11 '16 at 08:11
  • That's right: see my answer [here](http://stackoverflow.com/a/33618045/3574081) for how to do so. – mrry Jan 11 '16 at 14:48
  • Just curious, I often hear that python's global interpreter lock hurts its multithread performance. Is that an issue for tensorflow? Why not? – bill May 05 '18 at 05:11
  • @bill: Most of the performance-critical parts of TensorFlow are implemented in a C++ Python extension that creates its own threads for parallel execution. For example, when you call `tf.Session.run(...)`, most of this method is implemented in the C++ extension, using TensorFlow's own threadpool. Furthermore, many of TensorFlow's APIs (including `tf.Session.run()`) release the global interpreter lock before starting work, so it is possible to write an efficient multithreaded Python program that uses these APIs. – mrry May 07 '18 at 16:52