2

My question is how to parse tab-delimited output from a C function into a pandas DataFrame via ctypes:

I am writing a Python wrapper in Python3.x around a C library using ctypes. The C library currently does database queries. The C function I am accessing return_query() returns tab-delimited rows from a query, given the path to a file, an index, and a query-string:

int return_query(structname **output, const char *input_file,
                 const char *index, const char *query_string);

As you can see, I'm using output as the location to store all records from the query, whereby the structname is a struct for the rows

I also have a function which prints to STDOUT:

int print_query(const char *input_file,
                 const char *index, const char *query_string);

My goal is to access these functions via ctypes, and pass the tab-delimited row outputs into a pandas DataFrame.

My problem is this:

(1) I could try to parse the STDOUT of print_query(); however, these queries could result in large tab-delimited DataFrames. I worry this solution isn't efficient, as it might not scale to +10000s of rows. Other questions have roughly covered how to catch STDOUT from C functions in Python via ctypes:

Capturing print output from shared library called from python with ctypes module

(2) Could I access output somehow, and pass this to a pandas DataFrame? I'm currently not sure how this would work, e.g.

import ctypes

lib = CDLL("../libshared.so")  ### reference to shared library, *.so

lib.return_query.restype = ctypes.c_char
lib.return_query.argtypes = (???, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p)

What should the first argument be, and how would I pass it into something which could be a pandas DataFrame?

(3) Perhaps it would be better to re-write the C functions which return tab-delimited rows into something more accessible via ctypes?

ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234

1 Answers1

1

I was going to make a comment but stackoverflow block me from that.

1- The pandas object pass to c functions like PyObject *, so lib.return_query.argtypes = (c_types.c_void_p, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p)

2- If you are returning a tab-delimited rows that sounds more like ctypes.c_char_p, not lib.return_query.restype = ctypes.c_char. And your function int return_query, should be char * return_query

These are comments and observations not a full answer....

  • Thanks for the comments. "lib.return_query.argtypes = (c_types.c_void_p, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p)"---my question here is, what does the resulting function expect for `c_types.c_void_p`? – ShanZhengYang Jun 10 '18 at 15:41
  • You said you are passing a pandas DataFrame to your c code ...so I think python represent that like this PyObject * using c_types that becomes c_types.c_void_p. Do a check inside your function return_query, what is the type of the `PyObject *`...`char* PyTypeObject.tp_name` log this value...I was checking this macro PyLong_Check(op) where python check if op is a python Long...in your case you want to make sure is a Pandas DataFrame...I was looking at...https://github.com/python/cpython/blob/master/Include/longobject.h – Jorge Sierra Carbonell Jun 10 '18 at 16:19
  • To clarify, the C code returns tab-delimited strings, which I would like to feed into a pandas DataFrame – ShanZhengYang Jun 10 '18 at 18:30
  • If you are returning a tab-delimited string, that's * char return_query not int return_query, now probably you want to return a python string. So you may want to convert to PyString_FromString(my_tab_delimited_string) and return that. – Jorge Sierra Carbonell Jun 10 '18 at 20:59