I am writing a Python C extension on a very old Red Hat system. The system has zlib 1.2.3, which does not correctly support large files. Unfortunately, I can't just upgrade the system zlib to a newer version, since some of the packages poke into internal zlib structures and that breaks on newer zlib versions.
I would like to build my extension so that all the zlib calls (gzopen()
, gzseek()
etc.) are resolved to a custom zlib that I install in my user directory, without affecting the rest of the Python executable and other extensions.
I have tried statically linking in libz.a
by adding libz.a
to the gcc command line during linking, but it did not work (still cannot create large files using gzopen()
for example). I also tried passing -z origin -Wl,-rpath=/path/to/zlib -lz
to gcc, but that also did not work.
Since newer versions of zlib are still named zlib 1.x
, the soname
is the same, so I think symbol versioning would not work. Is there a way to do what I want to do?
I am on a 32-bit Linux system. Python version is 2.6, which is custom-built.
Edit:
I created a minimal example. I am using Cython (version 0.19.1).
File gztest.pyx
:
from libc.stdio cimport printf, fprintf, stderr
from libc.string cimport strerror
from libc.errno cimport errno
from libc.stdint cimport int64_t
cdef extern from "zlib.h":
ctypedef void *gzFile
ctypedef int64_t z_off_t
int gzclose(gzFile fp)
gzFile gzopen(char *path, char *mode)
int gzread(gzFile fp, void *buf, unsigned int n)
char *gzerror(gzFile fp, int *errnum)
cdef void print_error(void *gzfp):
cdef int errnum = 0
cdef const char *s = gzerror(gzfp, &errnum)
fprintf(stderr, "error (%d): %s (%d: %s)\n", errno, strerror(errno), errnum, s)
cdef class GzFile:
cdef gzFile fp
cdef char *path
def __init__(self, path, mode='rb'):
self.path = path
self.fp = gzopen(path, mode)
if self.fp == NULL:
raise IOError('%s: %s' % (path, strerror(errno)))
cdef int read(self, void *buf, unsigned int n):
cdef int r = gzread(self.fp, buf, n)
if r <= 0:
print_error(self.fp)
return r
cdef int close(self):
cdef int r = gzclose(self.fp)
return 0
def read_test():
cdef GzFile ifp = GzFile('foo.gz')
cdef char buf[8192]
cdef int i, j
cdef int n
errno = 0
for 0 <= i < 0x200:
for 0 <= j < 0x210:
n = ifp.read(buf, sizeof(buf))
if n <= 0:
break
if n <= 0:
break
printf('%lld\n', <long long>ifp.tell())
printf('%lld\n', <long long>ifp.tell())
ifp.close()
File setup.py
:
import sys
import os
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
if __name__ == '__main__':
if 'CUSTOM_GZ' in os.environ:
d = {
'include_dirs': ['/home/alok/zlib_lfs/include'],
'extra_objects': ['/home/alok/zlib_lfs/lib/libz.a'],
'extra_compile_args': ['-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g3 -ggdb']
}
else:
d = {'libraries': ['z']}
ext = Extension('gztest', sources=['gztest.pyx'], **d)
setup(name='gztest', cmdclass={'build_ext': build_ext}, ext_modules=[ext])
My custom zlib
is in /home/alok/zlib_lfs
(zlib version 1.2.8):
$ ls ~/zlib_lfs/lib/
libz.a libz.so libz.so.1 libz.so.1.2.8 pkgconfig
To compile the module using this libz.a
:
$ CUSTOM_GZ=1 python setup.py build_ext --inplace
running build_ext
cythoning gztest.pyx to gztest.c
building 'gztest' extension
gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/alok/zlib_lfs/include -I/opt/include/python2.6 -c gztest.c -o build/temp.linux-x86_64-2.6/gztest.o -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g3 -ggdb
gcc -shared build/temp.linux-x86_64-2.6/gztest.o /home/alok/zlib_lfs/lib/libz.a -L/opt/lib -lpython2.6 -o /home/alok/gztest.so
gcc
is being passed all the flags I want (adding full path to libz.a
, large file flags, etc.).
To build the extension without my custom zlib, I can compile without CUSTOM_GZ
defined:
$ python setup.py build_ext --inplace
running build_ext
cythoning gztest.pyx to gztest.c
building 'gztest' extension
gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/include/python2.6 -c gztest.c -o build/temp.linux-x86_64-2.6/gztest.o
gcc -shared build/temp.linux-x86_64-2.6/gztest.o -L/opt/lib -lz -lpython2.6 -o /home/alok/gztest.so
We can check the size of the gztest.so
files:
$ stat --format='%s %n' original/gztest.so custom/gztest.so
62398 original/gztest.so
627744 custom/gztest.so
So, the statically linked file is much larger, as expected.
I can now do:
>>> import gztest
>>> gztest.read_test()
and it will try to read foo.gz
in the current directory.
When I do that using non-statically linked gztest.so
, it works as expected until it tries to read more than 2 GB.
When I do that using statically linked gztest.so
, it dumps core:
$ python -c 'import gztest; gztest.read_test()'
error (2): No such file or directory (0: )
0
Segmentation fault (core dumped)
The error about No such file or directory
is misleading -- the file exists and is gzopen()
actually returns successfully. gzread()
fails though.
Here is the gdb
backtrace:
(gdb) bt
#0 0xf730eae4 in free () from /lib/libc.so.6
#1 0xf70725e2 in ?? () from /lib/libz.so.1
#2 0xf6ce9c70 in __pyx_f_6gztest_6GzFile_close (__pyx_v_self=0xf6f75278) at gztest.c:1140
#3 0xf6cea289 in __pyx_pf_6gztest_2read_test (__pyx_self=<optimized out>) at gztest.c:1526
#4 __pyx_pw_6gztest_3read_test (__pyx_self=0x0, unused=0x0) at gztest.c:1379
#5 0xf769910d in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:3690
#6 PyEval_EvalFrameEx (f=0x8115c64, throwflag=0) at Python/ceval.c:2389
#7 0xf769a3b4 in PyEval_EvalCodeEx (co=0xf6faada0, globals=0xf6ff81c4, locals=0xf6ff81c4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#8 0xf769a433 in PyEval_EvalCode (co=0xf6faada0, globals=0xf6ff81c4, locals=0xf6ff81c4) at Python/ceval.c:522
#9 0xf76bbe1a in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>, filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1335
#10 PyRun_StringFlags (str=0x80a24c0 "import gztest; gztest.read_test()\n", start=257, globals=0xf6ff81c4, locals=0xf6ff81c4, flags=0xffbf2888) at Python/pythonrun.c:1298
#11 0xf76bd003 in PyRun_SimpleStringFlags (command=0x80a24c0 "import gztest; gztest.read_test()\n", flags=0xffbf2888) at Python/pythonrun.c:957
#12 0xf76ca1b9 in Py_Main (argc=1, argv=0xffbf2954) at Modules/main.c:548
#13 0x080485b2 in main ()
One of the problems seems to be that the second line in the backtrace refers to libz.so.1
! If I do ldd gztest.so
, I get, among other lines:
libz.so.1 => /lib/libz.so.1 (0xf6f87000)
I am not sure why that is happening though.
Edit 2:
I ended up doing the following:
- compiled my custom zlib with all the symbols exported with a
z_
prefix.zlib
'sconfigure
script makes this very easy: just run./configure --zprefix ...
. - called
gzopen64()
instead ofgzopen()
in my Cython code. This is because I wanted to make sure I am using the correct "underlying" symbol. - used
z_off64_t
explicitly. - statically link my custom
zlib.a
into the shared library generated by Cython. I used'-Wl,--whole-archive /home/alok/zlib_lfs_z/lib/libz.a -Wl,--no-whole-archive'
while linking with gcc to achieve that. There might be other ways or this might not be needed but it seemed the simplest way to make sure the correct library gets used.
With the above changes, large files work while the rest of the Python extension modules/processes work as before.