3

I'm writing a c++ wrapper around tensorflow 1.2 C API (for inference purposes, if it matters). Since my application is a multi-process and multi-threaded one, where resources are explicitly allocated, I would like to limit Tensorflow to only use one thread.

Currently, running a simple inference test that allows batch processing, I see it is using all CPU cores. I have tried limiting number of threads for a new session using a mixture of C and C++ as follows (forgive my partial code snippet, I hope this makes sense):

tensorflow::ConfigProto conf;
conf.set_intra_op_parallelism_threads(1);
conf.set_inter_op_parallelism_threads(1);
conf.add_session_inter_op_thread_pool()->set_num_threads(1);
std::string str;
conf.SerializeToString(&str);
TF_SetConfig(m_session_opts,(void *)str.c_str(),str.size(),m_status);
m_session = TF_NewSession(m_graph, m_session_opts, m_status);

But I don't see it is making any difference - all cores are still fully utilized.

Am I using the C API correctly?

(My current work around is to recompile Tensorflow with hard coding number of threads to be 1, which will probably work, but its obviously not the best approach...)

-- Update --

I also tried adding:

conf.set_use_per_session_threads(true);

Without success. Still multiple cores are used...

I also tried to run with high log verbosity, and got this output (showing only what I think is relevant):

tensorflow/core/common_runtime/local_device.cc:40] Local device intraop parallelism threads: 8
tensorflow/core/common_runtime/session_factory.cc:75] SessionFactory type DIRECT_SESSION accepts target: 
tensorflow/core/common_runtime/direct_session.cc:95] Direct session inter op parallelism threads for pool 0: 1

The "parallelism threads: 8" message shows up as soon as I instantiate a new graph using TF_NewGraph(). I didn't find a way to specify options prior to this graph allocation though...

oferlivny
  • 300
  • 4
  • 15
  • I think you're using it correctly, but perhaps there are some errors. Could you validate that `conf.SerializeToString(&str)` is returning `true` and that `TF_GetCode(m_status) == TF_OK` after the call to `TF_SetConfig`? – ash Jul 12 '17 at 19:10
  • That's clearly not C, but C++. You cannot mix C and C++ in the same compilation unit. According to Wikipedia, there is no C API. – too honest for this site Jul 12 '17 at 19:41
  • @olaf Please see https://www.tensorflow.org/install/install_c. Mixing C and C++ is indeed a bit hacky, but it works. – oferlivny Jul 12 '17 at 20:42
  • @ash Status is indeed OK (I check it, just didn't include it in the snippet above). I'll verify the serialization return value later, but I'm pretty sure that the configuration is passed correctly, as I tried using illegal values - which made the session generation fail... – oferlivny Jul 12 '17 at 20:42
  • Can you try adding `conf.set_use_per_session_threads(true);`? By default, if you created a session previously in the same process with a different number of threads (or the default value, equal to the number of cores), TensorFlow would reuse the same threadpool created for that session in all subsequent sessions, unless you use this configuration option. – mrry Jul 12 '17 at 23:30
  • @oferlivny _sigh!_ You cannot mix C and C++ source code as much as you cannot mix Python and Brainfuck source code. They are simply different languages. A C++ compiler will not compile C code, nor will a C compiler compile C++ code. **Identical syntax does not imply identical semantics**. What you can do is write C++ code _C-style_. It still is C++. And that's exactly what your code above seems to be. – too honest for this site Jul 13 '17 at 00:00
  • According to the linked site, the C API is for implementers of foreign language interfaces, not for use in an application program. As you already use C++, it is pointless to use it; use the C++ API as intended. – too honest for this site Jul 13 '17 at 00:07
  • 1
    @olaf There are reasons not to use the C++ API of tensorflow, such as: https://stackoverflow.com/questions/39379747/import-opencv-mat-into-c-tensorflow-without-copying – oferlivny Jul 13 '17 at 03:45
  • @olaf My code compiles, and my compiler does not complain. Otherwise I would have been asking different questions. I can share my full code here if you like, it's simply not relevant to the question. There's are plenty of libraries that have C++ bindings around C implementations (zeromq, opencv...) Basically what I'm doing is using the C API, not writing C myself. Again, hacky, but given the limitations of C API and the C++ API of tensorflow, I don't really see a choice. But then again, that's why I'm here - am I doing something wrong? – oferlivny Jul 13 '17 at 03:53
  • Thanks @mrry, I tried your suggestion, which indeed feels relevant, and updated my question with my findings... Anything else I can try? – oferlivny Jul 13 '17 at 05:51
  • @oferlivny Please read and understand my comment **carefully** again. Especially the bold part. As a sidenote: Neither the C nor the C++ compiler not complying guarantees the code is correct or does not invoke undefined behaviour. One possible expression of undefined behaviour is it might **appear** to work flawless for a specific run/compiled code/day/weather. That does not mean it will next time of for all input, system state, etc. Using the C A**B**I(!) from C++ does not justify the C tag, nor does it make your code C. To end this: I already wrote what you should do. Use the C++ A**P**I. – too honest for this site Jul 13 '17 at 12:59
  • @olaf You have made your point clear, and I agree with everything you wrote. It might indeed be the root cause for my problems, although I tend to believe there should probably be a programmatic workaround for my problem. Considering my situtation (including your recommendation) I switched to the C++ API. So thanks :) – oferlivny Jul 13 '17 at 15:23
  • @oferlivny did switching to Tensorflow C++ API solve the problem? – fisakhan Aug 20 '20 at 15:20
  • @oferlivny how do you compile Tensorflow with hard coding number of threads to be 1? – fisakhan Aug 24 '20 at 10:26

2 Answers2

4

I had the same problem and solved it by setting the number of threads when creating the first TF session my application is creating. If the first created session is not created with a options object TF will create worker threads as the number of cores on the machine * 2.

Here is the C++ code I used:

// Call when application starts
void InitThreads(int coresToUse)
{
    // initialize the number of worker threads
    tensorflow::SessionOptions options;
    tensorflow::ConfigProto & config = options.config;
    if (coresToUse > 0)
    {
        config.set_inter_op_parallelism_threads(coresToUse);
        config.set_intra_op_parallelism_threads(coresToUse);
        config.set_use_per_session_threads(false);  
    }
    // now create a session to make the change
    std::unique_ptr<tensorflow::Session> 
        session(tensorflow::NewSession(options));
    session->Close();
}

Pass 1 to limit the number of inter & intra threads to 1 each.

Edit: IMPORTANT NOTE: This code works when called from the main application (google sample trainer) BUT stopped working when I moved it to a DLL dedicated to wrap tensorFlow). TF 1.4.1 ignores the parameter I pass and spins up all threads. I would like to hear your comments...

Mike S.
  • 118
  • 7
0

There is no problem in your use of TensorFlow C API. It is the limitation of C API to generate at least N number of threads, where N is the number of cores. You can't reduce it further.

Setting OMP_NUM_THREADS from environment can change the number of threads but TensorFlow overwrites those setting and generate N threads.

However, you can specify one or more cores to process. taskset, numatcl and setting thread affinity can lock a given process to a given core but don't change the number of threads.

The above mentioned solutions will reduce the total number of threads (but not to 1) generated by TensorFlow. The total number of threads generated by TensorFlow will still be multiple, depending on the number of cores in the CPU. In most of the cases, only one thread will be active while others will be in sleeping mode. I don't think it is possible to have a single-threaded TensorFlow.

The following github issues support my point of view. https://github.com/tensorflow/tensorflow/issues/33627 https://github.com/usnistgov/frvt/issues/30

Building TensorFlow C++ API from source and changing the source code may help.

fisakhan
  • 704
  • 1
  • 9
  • 27