0

I've created a linked list in Cython and so far it seems to perform good, but for my specific application, any performance gain is welcome.

The linked is being used be used to construct a binary BVH tree using the following algorithm (pyseudocode):

bvh_root = bvh_node()
bvh_root.subnodes = linked_list(all_leaf_nodes)

list = linked_list()
list.push_front(bvh_root)

while not list.length == 0:
    current_node = list.pop_back()
    if current_node.subnodes <= 2:
        continue
    center_a, axis = current_node.calculate_longest_axis_center()

    node_a = bvh_node()
    node_b = bvh_node()

    while not current_node.subnodes.length == 0:
        subnode = current_node.subnodes.pop_back()
        center_b = subnode.calculate_axis_center(axis)
        if center_b < center_a:
            node_a.subnodes.push_front(subnode)
        else:
            node_b.subnodes.push_front(subnode)

    if node_a.subnodes.length > 0:
        current_node.subnodes.push_front(node_a)
    if node_b.subnodes.length > 0:
        current_node.subnodes.push_front(node_b)

    list.push_front(node_a)
    list.push_front(node_b)

I was reading through common bottlenecks in C code and one of them caught my attention, it seems like malloc() is a rather slow function since it has to interact with the kernel and OS. It turns out, I call malloc() each time I insert a element on the list, and free each time I pop a element.

Only showing those two functions that matter more since the code is very big:

cdef void lkdlist_push(LinkedList* lst, void* data) nogil:
    cdef ListNode* node = <ListNode*> malloc(sizeof(ListNode))
    node.next = NULL
    node.data = data
    if lst.tail == NULL:
        lst.head = node
        lst.tail = node
    else:
        lst.tail.next = node
        lst.tail = node
    lst.length += 1

cdef void* lkdlist_pop(LinkedList* lst) nogil:
    cdef ListNode* node
    cdef void* data
    if lst.length <= 0:
        return NULL
    elif lst.length == 1:
        node = lst.head
        lst.head = NULL
        lst.tail = NULL
    else:
        node = lst.head
        lst.head = node.next
    data = node.data
    free(node)
    lst.length -= 1
    return data

I was wondering if instead of freeing a node, I just stored it somewhere to use it when I need to push an element could speedup a linked list.

Or all the logistics needed to store and recover unused nodes would outslow malloc?

Jeacom
  • 102
  • 6
  • My understanding is that `malloc` doesn't always hit the kernel. Usually making a system call for memory, the kernel will give your program more than it asked, so future `malloc` calls can draw on that for awhile. Without a [mcve] showing the func use and some benchmarks, I'm not sure that this question is answerable beyond "it depends". Can you provide more details? If you're doing a huge refactor to get a performance gain without profiling, you might be disappointed if it doesn't offer a huge benefit. How often are you creating/destroying these nodes and how many? BTW, welcome to SO! – ggorlen Aug 07 '19 at 16:52
  • I am creating two nodes almost immediately after destroying one, I cant show a minimal reproductive example here because even the linked list api alone has too many lines of code, I doubt any user would read it. – Jeacom Aug 07 '19 at 16:59
  • I am using this list to keep track of the nodes to build a binary tree. as I pop nodes from one side, I push two nodes on the other. so the list start with the root node, I pop it, and create two child nodes and push them on the list, so I can recursively pop them and create more four child nodes (two in each) and so on, until the tree is done. – Jeacom Aug 07 '19 at 17:05
  • OK, that helps. How big is the tree you're building? Are we talking 25 nodes, 250, 250k? – ggorlen Aug 07 '19 at 17:07
  • It will vary depending on the input data, anywhere from 100 to 300.000 leaf nodes. – Jeacom Aug 07 '19 at 17:08
  • 2
    All right, thanks. In the interests of cleaning up comments, I'd recommend putting all that info in your question and hopefully it'll be enough to make the question concrete and answerable. Any additional information you can muster is helpful! Also, check out [x-y problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) because you may be assuming malloc is the way to optimize when there are much easier and better optimizations available based on the code you haven't talked much about. The actual pop/push functions shown here are pretty standard and don't offer much. – ggorlen Aug 07 '19 at 17:10
  • `I was wondering if instead of freeing a node, I just stored it somewhere to use it when I need...` This is called a`free list`, and it is quite common. If you have an upper limit on the amount of nodes you'll ever need, it is also quite common to allocate the memory in one big chunk, and carve out the individual nodes as you need them. But: **premature optimisation is the root of all evel** – wildplasser Aug 07 '19 at 17:33
  • Some time ago I've investigated, whether the standard allocator can be replaced by pymalloc (i.e. malloc -> PyMem_Malloc, free-> PyMem_Free) for std::map/std::unordered_map (https://stackoverflow.com/a/56551790/5769463) The speed-up was between 2-4 times. However, PyMem_Malloc needs gil and your functions seems to be nogil (is it really needed?). But even if you will roll out your own implementation of free list, you maybe will need to aquire gil (or something similar) in order to use it from different threads. – ead Aug 07 '19 at 18:37

0 Answers0