how can we speed up the execution by ctypes?

Question

I want to find elements in an string array. I used two methods, python functions and compiling C++ function and using ctypes to handle in python scripts.

However, I found this way does not speed up the process. For example there is a very large array, and each element has a series of operations, such as replace, delete, and compare, the faster way should compiled the function by C++ code, but it doesn't work, how can I speed up python execution time?

Thanks for your replies.

C++ code:

extern "C" char* func(const char str1[][30], const char* str2, int size) {
    //size_t len = strlen(str2);
    //char* tmp = new char[len];
    string s1, s2;
    char* buf = new char[30];
    s2 = str2;
    transform(s2.begin(), s2.end(), s2.begin(), ::tolower); 
    int n = 0;
    for (int i = 0; i < size; i++) {
        n += 1;
        s1 = str1[i];
        transform(s1.begin(), s1.end(), s1.begin(), ::tolower); 
        if (0 == s1.compare(s2)) {
            s1 = str1[i];
            strcpy(buf, s1.c_str());
            break;
        }
    }
    if (n == size) {
        s1 = "notfind";
        strcpy(buf, s1.c_str());
    }
    return buf;
}

Python code:

import ctypes as ct
import numpy as np
import datetime

def findvalue(strList,s):
    n=0
    for i in strList:
        n=n+1
        i=i.lower()
        if s==i:
            result=s
            break
    if n==len(strList):
        result='not find'
    return result

strList = ['AFG','bB','ccc','AFG','bB','ccc',
           'AFG','bB','ccc','AFG','bB','ccc',
           'AFG','bB','ccc','AFG','bB','ccc',
           'AFG','bB','ccc','AFG','bB','ccc',
           'AFG','bB','ccc','AFG','bB','ccc',
           'AFG','bB','ccc','AFG','bB','kapa']
s2='kapa'
#####Method1:use ctypes
start=datetime.datetime.now()
dll = ct.CDLL('DLL1.dll')
dll.func.restype = ct.POINTER(ct.c_char_p)  
size=len(strList)
strList=np.array(strList)
listtemp=[ct.create_string_buffer(i.encode(), 20) for i in strList]
List = ((ct.c_char * 20) *size)(*listtemp)
cdata2 = ct.c_char_p(s2.encode())
p = dll.func(List, cdata2,size)
s = ct.cast(p, ct.c_char_p)
s = str(s.value, encoding="utf-8")
end=datetime.datetime.now()
sec=(end-start).total_seconds()
print("the time is: " +"{:.20f}".format(sec)+ " s")
#####Method2:use python function
start=datetime.datetime.now()
s2=s2.lower()
findvalue(strList,s2)
end=datetime.datetime.now()
sec=(end-start).total_seconds()
print("the time is: " +"{:.20f}".format(sec)+ " s")

How much data do you have? The operations you have are not individually time-consuming. The conversion to lowercase and the comparison are already going to be done in C code. You could pre-convert your "strList" to lowercase to avoid repeating that in every loop. — Tim Roberts, Dec 30 '21 at 07:54
Don't use the same name for the list and the array. An array with string dtype stores the strings in a different way fron a list of strings. Don't confuse them. — hpaulj, Dec 30 '21 at 08:56
How did you compile the C code? Did you enabled optimizations? I think this is important to be able to reproduce the problem. Besides this, is the size of the string bounded? — Jérôme Richard, Dec 30 '21 at 13:21
converting a Python list of Python strings to a C-compatible format is expensive...all that `create_string_buffer` and encoding. Also FYI, `.restype` should be just `ct.c_char_p`. You've declared the equivalent of `char**` then waste time casting back to `char*` and converting to Unicode string. — Mark Tolonen, Dec 30 '21 at 17:40
@Mark Tolonen this is just the example to find matched lowercase string( not `int` or `float` ) in an array, and then return the string of original array. yes, the `create_string_buffer` is very expensive. In another program. The list in python readed from the .csv is very large, and then use two aboved methods, the time difference is eight times, if i want to use ctypes to accelerate the python Scripts, can you give me some suggestions. — XF JI, Dec 31 '21 at 01:19
You didn't specify *argtypes* for your function (https://stackoverflow.com/questions/58610333/c-function-called-from-python-via-ctypes-returns-incorrect-value/58611011#58611011). But that's not related to speed. There's an overhead required by *CTypes*. In order for its usage to worth, the actual computational difference must be greater than that overhead. Adding 2 *int*s wouldn't make sense to be done in *CTypes* (even if the addition alone is faster). What are the times you get for your code? — CristiFati, Dec 31 '21 at 13:16
You should not include the time to load the DLL in the measured time. The DLL is expected to be loaded once in a Python script. The time to convert CPython objects is pretty big. The best would be to directly work on CPython objects. For such small strings there is no way the conversion time can be negligible compared to the (very small) computation time. Indeed, the Python code takes 4.5 us on my machine on 36 strings. This means about 125 ns/string. The allocation+deallocation of a CPython object take about 10-50 ns. — Jérôme Richard, Dec 31 '21 at 19:26
If you want the ctype code to be fast, then you need to pack the strings in 1 object or to work on bigger string or to directly work on CPython objects. There is no other alternatives. Still, the overhead of calling a C++ function from the CPython interpreter with few argument is certainly at least 1 us. Besides this, the C++ code is not very efficient and can be improved, but do not expect a big speed-up even if it would be (typically due to the previous overhead and the fact that strings are small). — Jérôme Richard, Dec 31 '21 at 19:33

how can we speed up the execution by ctypes?

0 Answers0