My question is how to parse tab-delimited output from a C function into a pandas DataFrame via ctypes:
I am writing a Python wrapper in Python3.x around a C library using ctypes. The C library currently does database queries. The C function I am accessing return_query()
returns tab-delimited rows from a query, given the path to a file, an index, and a query-string:
int return_query(structname **output, const char *input_file,
const char *index, const char *query_string);
As you can see, I'm using output
as the location to store all records from the query, whereby the structname
is a struct for the rows
I also have a function which prints to STDOUT:
int print_query(const char *input_file,
const char *index, const char *query_string);
My goal is to access these functions via ctypes, and pass the tab-delimited row outputs into a pandas DataFrame.
My problem is this:
(1) I could try to parse the STDOUT of print_query()
; however, these queries could result in large tab-delimited DataFrames. I worry this solution isn't efficient, as it might not scale to +10000s of rows. Other questions have roughly covered how to catch STDOUT from C functions in Python via ctypes:
Capturing print output from shared library called from python with ctypes module
(2) Could I access output
somehow, and pass this to a pandas DataFrame? I'm currently not sure how this would work, e.g.
import ctypes
lib = CDLL("../libshared.so") ### reference to shared library, *.so
lib.return_query.restype = ctypes.c_char
lib.return_query.argtypes = (???, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p)
What should the first argument be, and how would I pass it into something which could be a pandas DataFrame?
(3) Perhaps it would be better to re-write the C functions which return tab-delimited rows into something more accessible via ctypes?