5

I'm trying to learn TensorFlow's internals by stepping from its CIFAR-10 model training's python code into its core C++ code. Using Eclipse+PyDev for step by step debugging of the python code works great, but I can't find how to step into the C++ code of the TensorFlow core. I tried using Eclipse CDT to build the C++ code in a separate project, and attach the debugger to the python process running cifar10_train.py as described here, but the symbols are never loaded and (obviously) deferred breakpoints are never hit.

Background and setup:

I'm running on Ubuntu 14.04 LTS, installed the TensorFlow code from sources as described here and my CDT project uses a Makefile containing

bazel build -c dbg //tensorflow/cc:tutorials_example_trainer.

Community
  • 1
  • 1
user5568317
  • 111
  • 1
  • 6
  • 2
    You may need to re-build TensorFlow from source using "--compilation_mode dbg" in order to include the symbols – Yaroslav Bulatov Nov 17 '15 at 18:41
  • If I understand bazel's user manual correctly, your suggestion is equivalent to the "-c dbg" flag I used... – user5568317 Nov 17 '15 at 21:42
  • yes, equivalent. Hm...I wonder if the problem is that all the TensorFlow C symbols are not in "python" binary, but instead in .so files that are dynamically loaded and used through SWIG. Here's a link I found that looked a bit relevant -- http://library.tebyan.net/en/Viewer/Text/164572/330 – Yaroslav Bulatov Nov 17 '15 at 22:12
  • Thanks for the link. I was able to work around the problem in a different way and can step into the C++ code. Will post an answer for this question. – user5568317 Nov 18 '15 at 10:08

1 Answers1

6

TensorFlow loads a library called _pywrap_tensorflow.so that includes its C API (as defined in tensorflow/tensorflow/core/client/tensor_c_api.cc ).

In my case the library loaded during runtime was located in
~/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so
but the library that was built from the local source code was located in ~/.cache/bazel/_bazel_<username>/dbb3c677efbf9967e464a5c6a1e69337/tensorflow/bazel-out/local_linux-dbg/bin/tensorflow/python/_pywrap_tensorflow.so.

Copying the locally built library over the loaded library, and attaching to the python process as defined in the question solved the problem.

Guy Coder
  • 24,501
  • 8
  • 71
  • 136
user5568317
  • 111
  • 1
  • 6
  • after I replace with `_pywrap_tensorflow_internal.so`, I still can't get the debug symbol in core dump. Do you have any idea? My bazel build param is: `--config=opt --copt=-O2 --compilation_mode=dbg --strip=never --config=cuda --verbose_failures ` – hakunami Feb 19 '19 at 10:07