"writing a python binding" vs "using command-line directly"

Question

I have a question regarding python bindings.

I have a command-line which exposes some functionality and code is re-factored to provide the functionality through a shared library. I wanted to know what the real advantage that I get from "writing a python binding for the shared library" vs "calling the command line directly".

One obvious advantage I think will be performance, the shared library will link to the same process and the functionality can called within the same process. It will avoid spawning a new process through the command line.

Any other advantages I can get from writing a python binding for such a case ?

Thanks.

BTW: you can use ctypes and talk to the shared library without any binding http://docs.python.org/2/library/ctypes.html — Muayyad Alsadi, May 28 '13 at 07:31
What is your goal? Do you want to access your library in IPython for example? What kind of output does a command line call produce? It would feel super awkward to me to have Python assemble command line strings, call the OS to evaluate them and pipe the output back. Of course doable, but no less work than writing a Python wrapper for the library. Clearly choice number 1 for me. — Stefan, May 28 '13 at 07:42
Thanks Stefan and Muayyad. The goal is to make the functionality currently exposed in both command line as well as C shared library available to python code. — Ramakrishnan G, May 28 '13 at 08:08

Stefan · Accepted Answer · 2013-05-28T13:45:14.033

I can hardly imagine a case where one would prefer wrapping a library's command line interface over wrapping the library itself. (Unless there is a library that comes with a neat command line interface while being a total mess internally; but the OP indicates that the same functionality available via the command line is easily accessible in terms of library function calls).

The biggest advantage of writing a Python binding is a clearly defined data interface between the library and Python. Ideally, the library can operate directly on memory managed by Python, without any data copying involved.

To illustrate this, let's assume a library function does something more complicated than printing the current time, i.e., it obtains a significant amount of data as an input, performs some operation, and returns a significant amount of data as an output. If the input data is expected as an input file, Python would need to generate this file first. It must make sure that the OS has finished writing the file before calling the library via the command line (I have seen several C libraries where sleep(1) calls were used as a band-aid for this issue...). And Python must get the output back in some way.

If the command line interface does not rely on files but obtains all arguments on the command line and prints the output on stdout, Python probably needs to convert between binary data and string format, not always with the expected results. It also needs to pipe stdout back and parse it. Not a problem, but getting all this right is a lot of work.

What about error handling? Well, the command line interface will probably handle errors by printing error messages on stderr. So Python needs to capture, parse and process these as well. OTOH, the corresponding library function will almost certainly make a success flag accessible to the calling program. This is much more directly usable for Python.

All of this is obviously affecting performance, which you already mentioned.

As another point, if you are developing the library yourself, you will probably find after some time that the Python workflow has made the whole command line interface obsolete, so you can drop supporting it altogether and save yourself a lot of time.

So I think there is a clear case to be made for the Python bindings. To me, one of the biggest strengths of Python is the ease with which such wrappers can be created and maintained. Unfortunately, there are about 7 or 8 equally easy ways to do this. To get started, I recommend ctypes, since it does not require a compiler and will work with PyPy. For best performance use the native C-Python API, which I also found very easy to learn.

+1 for the general message. There are exceptions e.g., `curl` might be easier to use than the corresponding `pycurl` library. I don't understand your `sleep(1)` example: `file.flush()` or `file.close()` make the content available to other processes (it might not be written physically to the disk yet, but other processes still will be able to see the updated content). Anyway you can pipe the content directly to the subprocess (if it supports it) without writing to an intermediate file. — jfs, May 28 '13 at 12:59
@J.F.Sebastian The `sleep(1)` example actually comes from C code, where I have seen this way of dealing with concurrency issues more than once. I clarified this in my post. I agree that there are cases where one is forced to resort to the system call, most of all if one has no control over the system and the required functionality is available *only* in executable form. — Stefan, May 28 '13 at 14:01

"writing a python binding" vs "using command-line directly"

1 Answers1