First, do note that FILE*
is an stdio-specific entity. It doesn't exist at system level. The things that exist at system level are descriptors (retrieved with file.fileno()
) in UNIX (os.pipe()
returns plain descriptors already) and handles (retrieved with msvcrt.get_osfhandle()
) in Windows. Thus it's a poor choice as an inter-library exchange format if there can be more than one C runtime in action. You'll be in trouble if your library is compiled against another C runtime than your copy of Python: 1) binary layouts of the structure may differ (e.g. due to alignment or additional members for debugging purposes or even different type sizes); 2) in Windows, file descriptors that the structure links to are C-specific entities as well, and their table is maintained by a C runtime internally1.
Moreover, in Python 3, I/O was overhauled in order to untangle it from stdio
. So, FILE*
is alien to that Python flavor (and likely, most non-C flavors, too).
Now, what you need is to
- somehow guess which C runtime you need, and
- call its
fdopen()
(or equivalent).
(One of Python's mottoes is "make the right thing easy and the wrong thing hard", after all)
The cleanest method is to use the precise instance that the library is linked to (do pray that it's linked with it dynamically or there'll be no exported symbol to call)
For the 1st item, I couldn't find any Python modules that can analyze loaded dynamic modules' metadata to find out which DLLs/so's it have been linked with (just a name or even name+version isn't enough, you know, due to possible multiple instances of the library on the system). Though it's definitely possible since the information about its format is widely available.
For the 2nd item, it's a trivial ctypes.cdll('path').fdopen
(_fdopen
for MSVCRT).
Second, you can do a small helper module that would be compiled against the same (or guaranteed compatible) runtime as the library and would do the conversion from the aforementioned descriptor/handle for you. This is effectively a workaround to editing the library proper.
Finally, there's the simplest (and the dirtiest) method using Python's C runtime instance (so all the above warnings apply in full) through Python C API available via ctypes.pythonapi
. It takes advantage of
- the fact that Python 2's file-like objects are wrappers over
stdio
's FILE*
(Python 3's are not)
PyFile_AsFile
API that returns the wrapped FILE*
(note that it's missing from Python 3)
- for a standalone
fd
, you need to construct a file-like object first (so that there would be a FILE*
to return ;) )
the fact that id()
of an object is its memory address (CPython-specific)2
>>> open("test.txt")
<open file 'test.txt', mode 'r' at 0x017F8F40>
>>> f=_
>>> f.fileno()
3
>>> ctypes.pythonapi
<PyDLL 'python dll', handle 1e000000 at 12808b0>
>>> api=_
>>> api.PyFile_AsFile
<_FuncPtr object at 0x018557B0>
>>> api.PyFile_AsFile.restype=ctypes.c_void_p #as per ctypes docs,
# pythonapi assumes all fns
# to return int by default
>>> api.PyFile_AsFile.argtypes=(ctypes.c_void_p,) # as of 2.7.10, long integers are
#silently truncated to ints, see http://bugs.python.org/issue24747
>>> api.PyFile_AsFile(id(f))
2019259400
Do keep in mind that with fd
s and C pointers, you need to ensure proper object lifetimes by hand!
- file-like objects returned by
os.fdopen()
do close the descriptor on .close()
- so duplicate descriptors with
os.dup()
if you need them after a file object is closed/garbage collected
- while working with the C structure, adjust the corresponding object's reference count with
PyFile_IncUseCount()
/PyFile_DecUseCount()
.
- ensure no other I/O on the descriptors/file objects since it would screw up the data (e.g. ever since calling
iter(f)
/for l in f
, internal caching is done that's independent from stdio
's caching)