Apologies for the longish post; I came back again to a similar problem with debugging - a case where you take a long trip to the debugger, to finally reveal there is no actual bug - so I'd just like to post my notes and some code here (I'm still on Python 2.7, Ubuntu 11.04). In respect to the OP question - in newer gdb
's, its also possible to break by using the id(...)
function in the Python script, and having gdb
break on builtin_id
; but here's more details:
Again, I had a problem with a C .so shared library module for Python; this time it was svn.client
, which is a Swig module (see also here); in Debian/Ubuntu available via sudo apt-get install python-subversion
(filelist). The problem occured while trying to run the Example 8.3. A Python status crawler - Using the APIs (svnbook) This example should do the same that the terminal command svn status
does; but when I tried it on one of my working copies, it crashed with "Error (22): Error converting entry in directory 'path' to UTF-8", even if svn status
has been processing the same working copy (WC) directory (for years now) - so I wanted to see where that came from. My version of the test script is python-subversion-test.py; and my full debug log is in logsvnpy.gz (gzipped text file, ~188K uncompressed, should anyone want to wade through endless stepping and backtraces) - this being the abridged version. I have both Python 2.7 and 3.2 installed, but the 2.7 are default on Ubuntu 11.04:
$ ls -la $(which python python-dbg)
lrwxrwxrwx 1 root root 9 2012-02-29 07:31 /usr/bin/python -> python2.7
lrwxrwxrwx 1 root root 13 2013-04-07 03:01 /usr/bin/python-dbg -> python2.7-dbg
$ apt-show-versions -r 'python[^-]+'
libpython2.7/natty uptodate 2.7.1-5ubuntu2.2
libpython3.2/natty uptodate 3.2-1ubuntu1.2
python2.7/natty uptodate 2.7.1-5ubuntu2.2
python2.7-dbg/natty uptodate 2.7.1-5ubuntu2.2
python2.7-dev/natty uptodate 2.7.1-5ubuntu2.2
python2.7-minimal/natty uptodate 2.7.1-5ubuntu2.2
python3/natty uptodate 3.2-1ubuntu1
python3-minimal/natty uptodate 3.2-1ubuntu1
python3.2/natty uptodate 3.2-1ubuntu1.2
python3.2-minimal/natty uptodate 3.2-1ubuntu1.2
The first thing to note is how the Python example functions: there, to obtain the status of all files within a directory, first svn.client.svn_client_status2
is called - aside from the path, also with _status_callback
in the arguments, as a callback function in Python to be registered - and then blocks. While status2
is blocking, the underlying module iterates through all files in the WC directory path; and for each file entry, it calls the registered _status_callback
which should print out information about the entry. Once this recursion is over, status2
exits. Thus, the UTF-8 failure must come from the underlying module. Inspecting this module further:
$ python -c 'import inspect,pprint,svn.client; pprint.pprint(inspect.getmembers(svn.client))' | grep status
('status', <function svn_client_status at 0xb7351f44>),
('status2', <function svn_client_status2 at 0xb7351f0c>),
('status3', <function svn_client_status3 at 0xb7351ed4>),
('status4', <function svn_client_status4 at 0xb7351e9c>),
('svn_client_status', <function svn_client_status at 0xb7351f44>),
# ...
... reveals that there are other statusX
functions - however, status3
failed with the same UTF-8 error; while status4
caused a segmentation fault (which becomes yet another problem to debug).
And again, as in my comment to @EliBendersky's answer, I wanted to issue a breakpoint in Python, so as to obtain some sort of a call stack of C functions later on, which would reveal where the problem occurs - without me getting into rebuilding the C modules from source; but it didn't turn out to be that easy.
Python and gdb
First of all, one thing that can be very confusing is the relationship between gdb
and Python; the typical resources coming up here are:
- http://wiki.python.org/moin/DebuggingWithGdb - mentions a
gdbinit
in "GDB Macros",
That release27-maint/Misc/gdbinit is in the Python source tree; defines gdb
commands like pylocals
and pyframe
, but also mentions:
# NOTE: If you have gdb 7 or later, it supports debugging of Python directly
# with embedded macros that you may find superior to what is in here.
# See Tools/gdb/libpython.py and http://bugs.python.org/issue8032.
Features/EasierPythonDebugging - FedoraProject - has an example, mentions a Fedora python-debuginfo
package, and libpython
- Tools/gdb/libpython.py is also in Python source tree, and it mentions:
From gdb 7 onwards, gdb's build can be configured --with-python, allowing gdb
to be extended with Python code e.g. for library-specific data visualizations,
such as for the C++ STL types. ....
This module embeds knowledge about the implementation details of libpython so
that we can emit useful visualizations e.g. a string, a list, a dict, a frame
giving file/line information and the state of local variables
- cpython/Lib/test/test_gdb.py - apparently from cpython, seems to test
gdb
functionality from Python
This gets a bit confusing - apart from the pointer, that one better get themselves gdb
v.7; I managed to get for my OS:
$ apt-show-versions gdb
gdb 7.3-50.20110806-cvs newer than version in archive
A quick way to test if gdb
supports Python is this:
$ gdb --batch --eval-command="python print gdb"
<module 'gdb' (built-in)>
$ python -c 'import gdb; print gdb'
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: No module named gdb
... but gdb
supporting Python, doesn't mean Python on its own can access gdb
functionality (apparently, the gdb
has its own built-in separate Python interpreter).
It turns out, in Ubuntu 11.04, the python2.7-dbg
package installs a file libpython2.7.so.1.0-gdb.py
:
$ find / -xdev -name '*libpython*' 2>/dev/null | grep '\.py'
/usr/lib/debug/usr/lib/libpython2.7.so.1.0-gdb.py
$ sudo ln -s /usr/lib/debug/usr/lib/libpython2.7.so.1.0-gdb.py /usr/lib/debug/usr/lib/libpython.py
... and this is the one corresponding to the mentioned Tools/gdb/libpython.py
; the symlinking will allow us to refer to it as libpython
, and use import script mentioned in Features/EasierPythonDebugging .
The test_gdb.py
script is actually for Python 3 - I have modified it for 2.7, and posted in test_gdb2.7.py. This script calls gdb
through an OS system call, and tests its Python functionality, with printouts to stdout; it also accepts a command line option, -imp-lp
, which will import libpython
in gdb
before other commands are executed. So, for instance:
$ python-dbg test_gdb2.7.py
...
*** test_prettyprint ***
42 (self=0x0, v=0x8333fc8)
[] (self=0x0, v=0xb7f7506c)
('foo', 'bar', 'baz') (self=0x0, v=0xb7f7d234)
[0, 1, 2, 3, 4] (self=0x0, v=0xb7f7506c)
...
$ python-dbg test_gdb2.7.py -imp-lp
...
*** test_prettyprint ***
42 (self=0x0, v=42)
[] (self=0x0, v=[])
('foo', 'bar', 'baz') (self=0x0, v=('foo', 'bar', 'baz'))
[0, 1, 2, 3, 4] (self=0x0, v=[0, 1, 2, 3, 4])
...
Thus, libpython.py
is intended specifically for the Python interpreter inside gdb
, and it helps gdb
print Python representations (v=[]
) instead of just memory addresses (v=0xb7f7506c
) - which is only helpful, if gdb
happens to debug a Python script (or rather, it will debug the Python executable, that interprets the script).
The test_gdb.py
script also gives the pointer that you can "... run "python -c'id(DATA)'" under gdb with a breakpoint on builtin_id
"; for testing this, I have posted a bash script, gdb_py_so_test.sh, which creates an executable with a counting thread function, and both plain distutils and swig modules (in both debug and release versions) that interface to the same function. It also creates a .gdbinit
with both gdb
and gdb
's Python class breakpoints - and finally it runs gdb
on Python (loading one of the shared modules), where the user can hopefully see if the breakpoints are really triggering.
segfault in gdb without source rebuild
First I focused on the status4
segfault, and I wanted to know exactly which module does the function come from. I used a function, that can be found in debug_funcs.py; which can be called with separate regex for functions and modules, and may generate something like:
$ python python-subversion-test.py ./MyRepoWCDir
# ...
# example for debug_funcs.showLoadedModules(r'(?=.*\.(so|pyc))(?=.*svn)(?=.*client)')
#
svn.client 0xb74b83d4L <module 'svn.client' from '/usr/lib/pymodules/python2.7/svn/client.pyc'>
_client 0xb7415614L <module '_client' from '/usr/lib/pymodules/python2.7/libsvn/_client.so'>
libsvn.client 0xb74155b4L <module 'libsvn.client' from '/usr/lib/pymodules/python2.7/libsvn/client.pyc'>
#
# example for debug_funcs.showFunctionsInLoadedModules(r'status4', r'(?=.*\.(so|pyc))(?=.*svn)')
#
0xb738c4fcL libsvn.client svn_client_status4 libsvn/client.pyc
0xb74e9eecL _client svn_client_status4 libsvn/_client.so
0xb738c4fcL svn.client status4 svn/client.pyc
0xb738c4fcL svn.client svn_client_status4 svn/client.pyc
However, note that:
$ python-dbg python-subversion-test.py ./MyRepoWCDir
# ...
0x90fc574 - _client /usr/lib/pymodules/python2.7/libsvn/_client_d.so
# ...
0x912b30c _client svn_client_status4 libsvn/_client_d.so
# ...
$ apt-show-versions -r python-subversion
python-subversion/natty uptodate 1.6.12dfsg-4ubuntu2.1
python-subversion-dbg/natty uptodate 1.6.12dfsg-4ubuntu2.1
... python-dbg
will load different (debug, _d
) versions of the .so
modules of libsvn
(or python-subversion
); and that is because I have the python-subversion-dbg
package installed.
In any case, we may think we know the adresses where modules and respective functions are loaded upon each Python script call - which would allow us to place a gdb
breakpoint on a program address; given that here we work with "vanilla" .so's (that haven't been rebuilt from source). However, Python on its own cannot see that _client.so
in fact utilizes libsvn_client-1.so
:
$ ls -la $(locate '*2.7*/_client*.so') #check locations
$ ls -la $(locate 'libsvn_client') #check locations
$ ldd /usr/lib/pyshared/python2.7/libsvn/_client.so | grep client
libsvn_client-1.so.1 => /usr/lib/libsvn_client-1.so.1 (0x0037f000)
#
# instead of nm, also can use:
# objdump -dSlr file | grep '^[[:digit:]].*status4' | grep -v '^$\|^[[:space:]]'
#
$ nm -D /usr/lib/pyshared/python2.7/libsvn/_client.so | grep status4
U svn_client_status4
$ nm -a /usr/lib/pyshared/python2.7/libsvn/_client_d.so | grep status4
00029a50 t _wrap_svn_client_status4
U svn_client_status4
$ nm -D /usr/lib/libsvn_client-1.so.1 | grep status4 # -a: no symbols
00038c10 T svn_client_status4
From within Python, we could make a system call, to query /proc/pid/maps
for the address where libsvn_client-1.so
is loaded, and add to it the address reported by the last nm -D
command for the offset of svn_client_status4
; and obtain the address where we could break in gdb
(with the b *0xAddress
syntax) - but that is not necessarry, because if nm
can see the symbol, so can gdb
- so we can break directly on the function name. Another thing is that in case of a segfault, gdb
stops on its own, and we can issue a backtrace (note: use Ctrl-X A to exit the gdb TUI mode after layout asm
):
$ gdb --args python python-subversion-test.py ./AudioFPGA/
(gdb) r
Starting program: /usr/bin/python python-subversion-test.py ./MyRepoWCDir
...
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
#1 0x005a5bf3 in ?? () from /usr/lib/libsvn_client-1.so.1
#2 0x005dbf4a in ?? () from /usr/lib/libsvn_wc-1.so.1
#3 0x005dcea3 in ?? () from /usr/lib/libsvn_wc-1.so.1
#4 0x005dd240 in ?? () from /usr/lib/libsvn_wc-1.so.1
#5 0x005a5fe5 in svn_client_status4 () from /usr/lib/libsvn_client-1.so.1
#6 0x00d54dae in ?? () from /usr/lib/pymodules/python2.7/libsvn/_client.so
#7 0x080e0155 in PyEval_EvalFrameEx ()
...
(gdb) frame 1
#1 0x005a5bf3 in ?? () from /usr/lib/libsvn_client-1.so.1
(gdb) list
No symbol table is loaded. Use the "file" command.
(gdb) disas
No function contains program counter for selected frame.
(gdb) x/10i 0x005a5bf3
=> 0x5a5bf3: mov -0xc(%ebp),%ebx
0x5a5bf6: mov -0x8(%ebp),%esi
0x5a5bf9: mov -0x4(%ebp),%edi
0x5a5bfc: mov %ebp,%esp
(gdb) layout asm # No function contains program counter for selected frame (cannot show 0x5a5bf3)
(gdb) p svn_client_status4
$1 = {<text variable, no debug info>} 0x5a5c10 <svn_client_status4>
(gdb) frame 5
#5 0x005a5fe5 in svn_client_status4 () from /usr/lib/libsvn_client-1.so.1
(gdb) list
No symbol table is loaded. Use the "file" command.
(gdb) layout asm
│0x5a5fd8 <svn_client_status4+968> mov %esi,0x4(%esp) |
│0x5a5fdc <svn_client_status4+972> mov %eax,(%esp) |
│0x5a5fdf <svn_client_status4+975> mov -0x28(%ebp),%eax |
│0x5a5fe2 <svn_client_status4+978> call *0x38(%eax) |
>│0x5a5fe5 <svn_client_status4+981> test %eax,%eax |
│0x5a5fe7 <svn_client_status4+983> jne 0x5a5ce3 <svn_client_status4+211> |
│0x5a5fed <svn_client_status4+989> jmp 0x5a5ee3 <svn_client_status4+723> |
│0x5a5ff2 <svn_client_status4+994> lea -0x1fac(%ebx),%eax |
│0x5a5ff8 <svn_client_status4+1000> mov %eax,(%esp) |
So, our error happens somewhere in libsvn_client-1.so
, but in memory area before svn_client_status4
function start; and since we don't have debugging symbols - we cannot say much else than that. Using python-dbg
may give bit different results:
Program received signal SIGSEGV, Segmentation fault.
0x005aebf0 in ?? () from /usr/lib/libsvn_client-1.so.1
(gdb) bt
#0 0x005aebf0 in ?? () from /usr/lib/libsvn_client-1.so.1
#1 0x005e4f4a in ?? () from /usr/lib/libsvn_wc-1.so.1
#2 0x005e5ea3 in ?? () from /usr/lib/libsvn_wc-1.so.1
#3 0x005e6240 in ?? () from /usr/lib/libsvn_wc-1.so.1
#4 0x005aefe5 in svn_client_status4 () from /usr/lib/libsvn_client-1.so.1
#5 0x00d61e9e in _wrap_svn_client_status4 (self=0x0, args=0x8471214)
at /build/buildd/subversion-1.6.12dfsg/subversion/bindings/swig/python/svn_client.c:10001
...
(gdb) frame 4
#4 0x005aefe5 in svn_client_status4 () from /usr/lib/libsvn_client-1.so.1
(gdb) list
9876 in /build/buildd/subversion-1.6.12dfsg/subversion/bindings/swig/python/svn_client.c
(gdb) p svn_client_status4
$1 = {<text variable, no debug info>} 0x5aec10 <svn_client_status4>
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
...
0x00497a20 0x004c8be8 Yes /usr/lib/pymodules/python2.7/libsvn/_core_d.so
0x004e9fe0 0x004f52c8 Yes /usr/lib/libsvn_swig_py2.7_d-1.so.1
0x004f9750 0x00501678 Yes (*) /usr/lib/libsvn_diff-1.so.1
0x0050f3e0 0x00539d08 Yes (*) /usr/lib/libsvn_subr-1.so.1
0x00552200 0x00572658 Yes (*) /usr/lib/libapr-1.so.0
0x0057ddb0 0x005b14b8 Yes (*) /usr/lib/libsvn_client-1.so.1
...
0x00c2a8f0 0x00d11cc8 Yes (*) /usr/lib/libxml2.so.2
0x00d3f860 0x00d6dc08 Yes /usr/lib/pymodules/python2.7/libsvn/_client_d.so
...
(*): Shared library is missing debugging information.
... but the list
command still gives us a source line belonging to frame 5 (not frame 4), and we still don't know more about svn_client_status4
: while the python-subversion
modules are loaded in their debug versions, debugging information is missing for libsvn_client-1.so
. So, time to rebuild from source.
segfault in gdb with source rebuild
It is the actual subversion
that we need to rebuild, or rather it's library part - since we already have debug modules from python-subversion
; the package on my system is called libsvn1
:
$ apt-show-versions -r 'libsvn'
libsvn1/natty uptodate 1.6.12dfsg-4ubuntu2.1
$ apt-cache search 'libsvn' | grep 'dbg'
python-subversion-dbg - Python bindings for Subversion (debug extension)
... and there is no debug package for it. To rebuild from source, I went through apt-get source libsvn1
, with dependencies manually found via apt-rdepends --build-depends --follow=DEPENDS subversion
. There are more details in the full log - but here we can note that the source package can built both the SWIG Python bindings (that is, python-subversion
) and the Subversion library (libsvn1
). Also, I ran make install
with a location out of the main kernel tree; that means, that one had to explicitly specify the source-built modules via LD environment variables:
$ ELD=/path/to/src/subversion-1.6.12dfsg/tmpinst/usr/local/lib
$ LD_LIBRARY_PATH=$ELD:$ELD/svn-python/libsvn LD_PRELOAD="$ELD/libsvn_client-1.so $ELD/svn-python/libsvn/_core.so" gdb --args python python-subversion-test.py ./MyRepoWCDir
One tricky thing here is that building SWIG debug modules requires a call with python-dbg
; apparently just doing ./configure --enable-debug
doesn't do that; and so, just _core.so
, etc are produced, albeit with debugging information. If we then try to enforce its loading as with the above command, but with python-dbg
, we will get undefined symbol: Py_InitModule4
, because:
$ objdump -d $(which python) | grep '^\w.*InitMod'
0813b770 <Py_InitModule4>:
$ objdump -d $(which python-dbg) | grep '^\w.*InitMod'
08124740 <Py_InitModule4TraceRefs>:
... python-dbg
has a different Py_InitModule4
function. That, however, wasn't a problem, because simply python
was used (as in the above invocation), and gdb
still allowed stepping through the relevant functions in the newly built libsvn
(the mentioned Bash script gdb_py_so_test.sh, as an example builds a basic Swig module in both debug and release versions to confirm the right procedure).
With debugging symbols for libsvn
, the function call stack looks like this (pasted a bit differently):
#5 0x0016e654 in svn_client_status4 (..., libsvn_client/status.c:369
#4 0x007fd209 in close_edit (..., libsvn_wc/status.c:2144
#3 0x007fafaa in get_dir_status (..., libsvn_wc/status.c:1033
#2 0x007fa4e7 in send_unversioned_item (..., libsvn_wc/status.c:722
#1 0x0016dd17 in tweak_status (..., libsvn_client/status.c:81
#0 0x00000000 in ?? ()
... and since the same library functions are also used by command line svn client
, we can compare, in say, frame 5:
# `svn status`:
(gdb) p *(sb->real_status_func)
$3 = {svn_error_t *(void *, const char *, svn_wc_status2_t *, apr_pool_t *)} 0x805e199 <print_status>
...
# `python python-subversion-test.py`
(gdb) p *(svn_wc_status_func3_t*)sb->real_status_func
Cannot access memory at address 0x0
So, in case of a Python call to status4
, sb->real_status_func
is NULL, causing a segfault. The reason for this can be revealed once we start reading the source: in ./subversion/libsvn_client/deprecated.c
, the definition for status3
has:
svn_client_status3(svn_revnum_t *result_rev,
const char *path,
const svn_opt_revision_t *revision,
svn_wc_status_func2_t status_func,
void *status_baton,
....
struct status3_wrapper_baton swb = { 0 };
swb.old_func = status_func;
swb.old_baton = status_baton;
return svn_client_status4(result_rev, path, revision, status3_wrapper_func,
&swb, depth, get_all, update, no_ignore,
ignore_externals, changelists, ctx, pool);
... that is, when status3
is called with a callback function, it creates a struct, and assigns the function to one of the struct properties - and then uses the struct in the further call to status4
! Since status3
actually works from Python - the conclusion is that we cannot correctly call status4
from Python (since that would involve creating a C struct in Python); and that doesn't matter anyways, because we can call status3
from Python - which then itself calls status4
!
Then why is status4
addressible from Python? Probably because swig
simply autogenerated an interface for it... In any case, here is an example, where a trip to the debugger reveals the source of the problem - but not really a bug :)
Solution? Don't use status4
.
C failure in Python module, in gdb with source rebuild
Going back to the UTF-8 failure, which occured with status2
and status3
- it was easier, given that now source built versions of the modules were available. The problem was obvious in the function entry_name_to_utf8
, and by exploring it's argument name
, one could first realize that the file name causing the problem, did indeed contain non-ascii - but still legal UTF-8 characters (see Program to check/look up UTF-8/Unicode characters in string on command line? - Super User). I have then used this .gdbinit, to make a Python class breakpoint for gdb, that would print out the filenames, and break only on match with the problematic one.
Then the question is - how come, the command line client svn status
does not crash on the same filename? By stepping through both svn status
and python python-subversion-test.py
, one can compare the respective function call stacks:
# call stack Python module:
#
_wrap_svn_client_status3 subversion/bindings/swig/python/svn_client.c * allocs:
(svn_swig_py_get_pool_arg(args, SWIGTYPE_p_apr_pool_t, &_global_py_pool, &_global_pool))
svn_client_status3 subversion/libsvn_client/deprecated.c
svn_client_status4 subversion/libsvn_client/status.c
close_edit subversion/libsvn_wc/status.c
get_dir_status subversion/libsvn_wc/status.c
# call stack svn client:
#
main subversion/svn/main.c
svn_cl__status subversion/svn/status-cmd.c * allocs
(subpool = svn_pool_create(pool))
svn_client_status4 subversion/libsvn_client/status.c
close_edit subversion/libsvn_delta/cancel.c
close_edit subversion/libsvn_wc/status.c
get_dir_status subversion/libsvn_wc/status.c
# svn call stack:
# ... svn_client_status4 - starts pool
#
get_dir_status subversion/libsvn_wc/status.c
handle_dir_entry subversion/libsvn_wc/status.c
get_dir_status subversion/libsvn_wc/status.c
svn_io_get_dirents2 subversion/libsvn_subr/io.c
entry_name_to_utf8 subversion/libsvn_subr/io.c
svn_path_cstring_to_utf8 subversion/libsvn_subr/path.c
svn_utf_cstring_to_utf8 subversion/libsvn_subr/utf.c * from here, bad node->handle
convert_cstring subversion/libsvn_subr/utf.c
convert_to_stringbuf subversion/libsvn_subr/utf.c * here, bad node => fail
At this point, one encounters the fact that Subversion uses libapr
(Apache Portable Runtime) for memory allocation; and it is in fact this part causing the failure - principally, the function apr_xlate_conv_buffer
behaves differently in the two cases.
But, it can be rather difficult to see what the actual problem is here, because apr_xlate_conv_buffer
uses an encoding in node->frompage
, which is set to the define APR_LOCALE_CHARSET 1
- and that doesn't change between svn status
and Python cases. To come down to this, I've copy-pasted everything related to APR string copying and allocation down the call stack, and reconstructed a simple example that builds a Swig module, that should just copy a string using APR runtime; that example is in the directory aprtest, built with the bash script build-aprtest.sh.
Thanks to that example, it was revealed that the UTF failure problem can be fixed by calling setlocale
in C before any APR string memory allocation - for more about that test, see #15977257 - Using utf-8 input for cmd Python module. Correspondingly, all we need to do from Python is execute:
import locale
locale.setlocale(locale.LC_ALL, '')
... before any calls to svn.client
(and thus to libsvn
, and thus to libapr
). And here we have yet another example, for a trip to the debugger, without really having a bug :)