10

I have a library function (written in C) that generates text by writing the output to FILE *. I want to wrap this in Python (2.7.x) with code that creates a temp file or pipe, passes it into the function, reads the result from the file, and returns it as a Python string.

Here's a simplified example to illustrate what I'm after:

/* Library function */
void write_numbers(FILE * f, int arg1, int arg2)
{
   fprintf(f, "%d %d\n", arg1, arg2);
}

Python wrapper:

from ctypes import *
mylib = CDLL('mylib.so')


def write_numbers( a, b ):
   rd, wr = os.pipe()

   write_fp = MAGIC_HERE(wr)
   mylib.write_numbers(write_fp, a, b)
   os.close(wr)

   read_file = os.fdopen(rd)
   res = read_file.read()
   read_file.close()

   return res

#Should result in '1 2\n' being printed.
print write_numbers(1,2)

I'm wondering what my best bet is for MAGIC_HERE().

I'm tempted to just use ctypes and create a libc.fdopen() wrapper that returns a Python c_void_t, then pass that into the library function. I'm seems like that should be safe in theory--just wondering if there are issues with that approach or an existing Python-ism to solve this problem.

Also, this will go in a long-running process (lets just assume "forever"), so any leaked file descriptors are going to be problematic.

Brian McFarland
  • 9,052
  • 6
  • 38
  • 56
  • `os.popen()` is incorrect. It requires at least one argument, the command line to invoke and get pipes to. Besides, it's deprecated in favour of `subprocess`, as [the docs](https://docs.python.org/2/library/os.html?highlight=os.popen#os.popen) say. – ivan_pozdeev Oct 23 '15 at 20:26
  • Sorry, I meant `os.pipe()`. Updated. – Brian McFarland Oct 23 '15 at 20:30
  • 1
    Unless you're also planning to run this on Windows, which has the problem of potentially mismatched C runtime libraries, then I don't think you'll have any problem calling `libc.fdopen` and passing the resulting `FILE` pointer. But instead of using `c_void_p`, I'd create an opaque `class FILE(Structure): pass` and set `libc.fdopen.restype = POINTER(FILE)`. This won't be converted to an integer result. OTOH, `c_void_p` as the `restype` gets converted to an integer, so you'd have to make sure that `mylib.write_numbers.argtypes` is also set to prevent truncating a 64-bit pointer value. – Eryk Sun Oct 23 '15 at 20:43
  • Did you consider using [`fmemopen`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fmemopen.html)? If the amount of data that will ever be written by a single `write_numbers` call is bounded b a reasonably small fixed constant, it could provide a good alternative to using a pipe. – 5gon12eder Oct 24 '15 at 01:46
  • @5gon12eder, Nope :). That is no doubt, way faster. I'm sure its all user-space, except for (maybe) the underlying memory allocation. But then I would need to figure out how to read the resulting `FILE *` from Python (again, likely involving calls to libc). I'll keep that in mind in case performance ever matters, but for now, I think `os.pipe()` is simpler, at the cost of just a few extra system calls. – Brian McFarland Oct 26 '15 at 16:02
  • 1
    @BrianMcFarland You don't have to (and I'm not sure you even can) read the `FILE *` back in. But you can simply read the `char[]` array that you passed to `fmemopen`. – 5gon12eder Oct 27 '15 at 06:28
  • @5gon12eder - dunno why that hadn't occurred to me. Now I think I like that idea as it reduces the number of things I have to worry about properly cleaning up / freeing AND it reduces system calls. Btw - `fmemopen` allows setting the mode to "r+" or "w+", which would allow read/write. – Brian McFarland Oct 27 '15 at 17:25

1 Answers1

5

First, do note that FILE* is an stdio-specific entity. It doesn't exist at system level. The things that exist at system level are descriptors (retrieved with file.fileno()) in UNIX (os.pipe() returns plain descriptors already) and handles (retrieved with msvcrt.get_osfhandle()) in Windows. Thus it's a poor choice as an inter-library exchange format if there can be more than one C runtime in action. You'll be in trouble if your library is compiled against another C runtime than your copy of Python: 1) binary layouts of the structure may differ (e.g. due to alignment or additional members for debugging purposes or even different type sizes); 2) in Windows, file descriptors that the structure links to are C-specific entities as well, and their table is maintained by a C runtime internally1.

Moreover, in Python 3, I/O was overhauled in order to untangle it from stdio. So, FILE* is alien to that Python flavor (and likely, most non-C flavors, too).

Now, what you need is to

  • somehow guess which C runtime you need, and
  • call its fdopen() (or equivalent).

(One of Python's mottoes is "make the right thing easy and the wrong thing hard", after all)


The cleanest method is to use the precise instance that the library is linked to (do pray that it's linked with it dynamically or there'll be no exported symbol to call)

For the 1st item, I couldn't find any Python modules that can analyze loaded dynamic modules' metadata to find out which DLLs/so's it have been linked with (just a name or even name+version isn't enough, you know, due to possible multiple instances of the library on the system). Though it's definitely possible since the information about its format is widely available.

For the 2nd item, it's a trivial ctypes.cdll('path').fdopen (_fdopen for MSVCRT).


Second, you can do a small helper module that would be compiled against the same (or guaranteed compatible) runtime as the library and would do the conversion from the aforementioned descriptor/handle for you. This is effectively a workaround to editing the library proper.


Finally, there's the simplest (and the dirtiest) method using Python's C runtime instance (so all the above warnings apply in full) through Python C API available via ctypes.pythonapi. It takes advantage of

  • the fact that Python 2's file-like objects are wrappers over stdio's FILE* (Python 3's are not)
  • PyFile_AsFile API that returns the wrapped FILE* (note that it's missing from Python 3)
    • for a standalone fd, you need to construct a file-like object first (so that there would be a FILE* to return ;) )
  • the fact that id() of an object is its memory address (CPython-specific)2

    >>> open("test.txt")
    <open file 'test.txt', mode 'r' at 0x017F8F40>
    >>> f=_
    >>> f.fileno()
    3
    >>> ctypes.pythonapi
    <PyDLL 'python dll', handle 1e000000 at 12808b0>
    >>> api=_
    >>> api.PyFile_AsFile
    <_FuncPtr object at 0x018557B0>
    >>> api.PyFile_AsFile.restype=ctypes.c_void_p   #as per ctypes docs,
                                             # pythonapi assumes all fns
                                             # to return int by default
    >>> api.PyFile_AsFile.argtypes=(ctypes.c_void_p,) # as of 2.7.10, long integers are
                    #silently truncated to ints, see http://bugs.python.org/issue24747
    >>> api.PyFile_AsFile(id(f))
    2019259400
    

Do keep in mind that with fds and C pointers, you need to ensure proper object lifetimes by hand!

  • file-like objects returned by os.fdopen() do close the descriptor on .close()
    • so duplicate descriptors with os.dup() if you need them after a file object is closed/garbage collected
  • while working with the C structure, adjust the corresponding object's reference count with PyFile_IncUseCount()/PyFile_DecUseCount().
  • ensure no other I/O on the descriptors/file objects since it would screw up the data (e.g. ever since calling iter(f)/for l in f, internal caching is done that's independent from stdio's caching)
Community
  • 1
  • 1
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • If you're worried about the library using a different C runtime (mostly a Windows problem), then using `PyFile_AsFile` solves nothing, and limits the code to Python 2 for no good reason. Why bring Cython into the discussion? That's a random segue. – Eryk Sun Oct 24 '15 at 00:04
  • Also, never pass `id(f)` as a pointer. You want `py_object(f)` to pass a Python object -- as `PyObject *` for CPython. Using `id` to get a base address is specific to CPython, and passing Python integers as *arguments* also defaults to being converted as 32-bit C int values, which will truncate a 64-bit pointer value. – Eryk Sun Oct 24 '15 at 00:05
  • I'd like to see some backing for "truncating pointers to integers". Python does have a notion of long integers, you know, and there's completely no reason to truncate a `c_void_p`. – ivan_pozdeev Oct 24 '15 at 00:14
  • @eryksun Yeah, I already realized that Cython isn't of any advantage (and explicitly written that; I don't want to delete it just yet since it has a link to a relevant discussion). I showcased `PyFile_AsFile` - with all due warnings - because there's really no other (currently available) way that I found. – ivan_pozdeev Oct 24 '15 at 00:28
  • In UNIX, library incompatibilities can happen, too - with debug/release builds, static builds, different versions and things like `valgrind`. – ivan_pozdeev Oct 24 '15 at 00:36
  • @eryksun I don't see it documented at [`ctypes.py_object`](https://docs.python.org/2/library/ctypes.html?highlight=py_object#ctypes.py_object) what calling `py_object` on a Python object would produce - or that it's specifically guaranteed to work cross-platform when calling the C API. Sure, it works, but I'm not yet convinced this is the "one true way". – ivan_pozdeev Oct 24 '15 at 00:45
  • If you pass to a function without setting `argtypes` to a pointer type, then ctypes truncates the value to a 32-bit C int. You could test this without forcing me to show you the code, but here's a link to the the [`ConvParam`](https://hg.python.org/cpython/file/v2.7.10/Modules/_ctypes/callproc.c#l563) source. Note that a converted integer is assigned to `value.i`, the `int` field in `union result`. In the ctypes docs section 15.17.1.3 states "Python integers and Python longs are passed as the platforms default C int type, their value is masked to fit into the C type". – Eryk Sun Oct 24 '15 at 01:31
  • @eryksun Now this is a different talk! Added `argtypes`, thanks. Assigning `long` to an `int` field when there's a `long` one smells like a bug - and indeed, [you were not the first one to notice](http://bugs.python.org/issue24747). – ivan_pozdeev Oct 24 '15 at 02:04
  • Here's an [old answer](http://stackoverflow.com/a/16649521) from 2013, but I knew this years before that because it's well-documented. In that bug report I was concerned about the inconsistency for values beyond the range `LONG_MIN` to `ULONG_MAX`. Silently wrapping around when `c_int` is declared versus raising an exception in the undeclared case is an ugly wart. I would prefer that `ConvParam` switched to using `PyLong_AsUnsignedLongMask` or `PyInt_AsUnsignedLongMask` (2.x). – Eryk Sun Oct 24 '15 at 02:31
  • 1
    What's your aversion to setting `api.PyFile_AsFile.argtypes=(ctypes.py_object,)` and calling as `api.PyFile_AsFile(f)`? It's simpler, and also the intended usage. – Eryk Sun Oct 24 '15 at 02:35
  • Mine is explicit. I tell exactly that I want a pointer, get and pass one rather than hoping that some hidden mechanism does this for me. Your suggestion is noted though, as you can see. – ivan_pozdeev Oct 24 '15 at 02:39
  • But there's no hoping, here. `py_object` was added exactly for this case. Do you want another source link? In this case the `getfunc` and `setfunc` are trivial, but if you need concrete reassurance... – Eryk Sun Oct 24 '15 at 02:42
  • Also, per my comment on the question, I recommend creating an opaque `FILE` struct and setting `api.PyFile_AsFile.restype = ctypes.POINTER(FILE)`. This gets around the int conversion that ctypes uses for simple `c_void_p`, and it's more typesafe in `argtypes` (which you need anyway since it's a pointer argument). – Eryk Sun Oct 24 '15 at 02:43
  • I wouldn't call it "well documented enough". It's in a tutorial, rather than reference (15.17.2.3.), section - that's why it slipped by me. – ivan_pozdeev Oct 24 '15 at 02:43
  • And the note at [`ctypes.c_int`](https://docs.python.org/2/library/ctypes.html#ctypes.c_int) is meant for explicit conversion - since it's about using `ctypes.c_int` in Python code, not converting anything to `int` anywhere in the Python's code base. – ivan_pozdeev Oct 24 '15 at 03:03
  • I see your point that the C int conversion is only documented in 15.17.1.3 and 15.17.1.8 in the tutorial section. If you create an issue to have this documented in the reference section as well, someone (maybe you) may be motivated enough to update the docs. That has a much better chance than changing the way ctypes has functioned for over a decade. – Eryk Sun Oct 24 '15 at 03:16
  • @eryksun @[c54426759](http://stackoverflow.com/questions/33310675/pass-file-into-function-from-python-ctypes/33311066#comment54426759_33311066): I don't want a source link, I want a reference link. Because if something is not documented, it doesn't exist is not guaranteed to 1) work everywhere (=in every conforming environment); 2) continue to work for an extended period of time. When I write code, I build it to last (until the computer crumbles into dust ;) ). – ivan_pozdeev Oct 24 '15 at 04:03
  • `py_object` is documented in the reference section. It's like all other simple types that automatically convert between Python objects and C data. In this case the C data is a `PyObject *`. There may be a question about reference counting, of which the docs say nothing. `c_char_p` and `py_object` have to keep a reference to the source object, to keep it alive. This can be inspected via the private `_objects` attribute. – Eryk Sun Oct 24 '15 at 04:50
  • 1
    @ivan_pozdeev - As a fairly experienced C programmer, this is the first I've heard the notion that using a `FILE *` as part of a public API is a bad idea. Not saying you're wrong--I'm rarely writing libraries meant for public use. But are you really saying the use of a file number is superior? `FILE *` is part of the C standard. File descriptors that come from `open`, e.g. are not. So you're saying while `stdio.h` is far more portable, it's bad to use for public APIs? Have you ever seen this cause a problem in practice? Read a blog post on it? Or is this purely speculative? – Brian McFarland Oct 26 '15 at 15:28
  • Not disagreeing, just would like to see more references on the idea. Also wanted to note: I can change the library too. It used to write to a global `FILE *`, declared `extern` by the library, then defined by the application using it. I changed the functions to all take a `FILE *` as an argument as an improvement: provides some flexibility while requiring very minimal modification to the existing programs. I considered returning a malloc string from the library instead, but then I'd need to add 4 lines of code to ever usage to get & store a pointer value, check for null, and free it. – Brian McFarland Oct 26 '15 at 15:45
  • So it's an old, large code base that I inherited. Did not design it, but am free to change it as needed if there are other suggestions. I feel at that point, maybe I should open a new question minus the Python part. – Brian McFarland Oct 26 '15 at 15:48
  • I know for a fact that MSVCRT manages descriptors privately (https://github.com/changloong/msvcrt/blob/master/io/open.c#L201). Others are speculation projected from what I know about mixing different instances of the same library. This is mainly a problem in Windows, but should be no different in other OSes if there ever are multiple non-binary-compatible instances. See e.g. http://stackoverflow.com/questions/11658915/mixing-debug-and-release-library-binary-bad-practice, https://www.softwariness.com/articles/visual-cpp-runtime-libraries/ for backing. – ivan_pozdeev Oct 26 '15 at 17:27
  • http://valgrind.org/docs/manual/manual-core.html#manual-core.whatdoes says that it instruments all the loaded modules in the same manner, so this particular one shouldn't be a source of discrepancies. – ivan_pozdeev Oct 26 '15 at 17:29
  • Of course, it's for you to decide if "there ever are multiple non-binary-compatible instances" will ever be the case in your case. After all, if you have the library's source, you can always compile it to use the same dynamic module as the local Python. – ivan_pozdeev Oct 26 '15 at 17:33