0

In the code bellow, I'm searching for ways to optimize the speed of insertions when dealing with Millions of inserts at the time. the code runs fine but is very slow when performing large amounts of inserts. I already tried some ideas but is always slow. I guess the solution is to use Multi-threading to perform the inserts, and use a global variable "truct Node* nodeObj". I have very few/no experience with Multi-threading and Synchronization in C, C++, If you could give me an example based on the code bellow I would be very appreciated. Any additional Ideas are welcome.

The rules are: 1- In the end (after insert all numbers) the linked list must be ordered, meaning that each Node->data starts at lowest number up to the Largest number. 2 - The caller is using a for loop, this for loop cannot be started inside the Insert function. 3 - Any of the code, caller or insert function can be optimized as long as the insert function does not automatically add inserts by it self, that's the caller s job.

#include<stdio.h>
#include<stdlib.h>
#include<conio.h>

struct Node {
    int data;
    struct Node* nextAddr;
};

struct Node* Insert(Node* p, int data);
void Print(struct Node* p);
void RevPrint(struct Node* p);

int main() {

    struct Node* nodeObj = NULL;
    printf("---------------------------------\n"
        "Insert()\n---------------------------------\n");
    for (int i = 1; i < 1000000; i++) {
        nodeObj = Insert(nodeObj, i);
        printf(" %d", i);
    }
    printf("\n---------------------------------\n"
        "Print()\n---------------------------------\n");
    Print(nodeObj);
    printf("---------------------------------\n"
        "RevPrint()\n---------------------------------");
    RevPrint(nodeObj);

    printf("\nPress any key to continue...");
    _getch();
}

struct Node* Insert(Node* _pToNode, int _nbr)
{
    Node *newValue = (struct Node*)malloc(sizeof(struct Node));
    newValue->data = _nbr;
    newValue->nextAddr = NULL;

    if (_pToNode == NULL) _pToNode = newValue;
    else {
        Node* pLocal = _pToNode;
        while (pLocal->nextAddr != NULL) {
            pLocal = pLocal->nextAddr;
        }
        pLocal->nextAddr = newValue;
    }

    return _pToNode;
}

void Print(struct Node* p) {

    if (p == NULL) {
        printf("\n");
        return;
    }
    printf(" %d", p->data);
    Print(p->nextAddr);
}

void RevPrint(struct Node* p) {

    if (p == NULL) {
        printf("\n");
        return;
    }
    RevPrint(p->nextAddr);
    printf(" %d", p->data);
}

Thank you.

NKF
  • 63
  • 1
  • 9
  • Parallel execution is feasible for semantically independent operations. Since each insertion depends on every foregoing one, I don't think threads can give you an edge here. Besides, your question is too broad. There are many ways to make something multi-threaded plus the variety of libraries to use: pthread? C++ ? Linux threads? Or something else? And last but not least, C++ or C. They are different. – cadaniluk Mar 30 '18 at 21:58
  • One way you could speed it up is to allocate memory for a pool of `struct Node`s, an array of them. When you have used up all those in the array, allocate another chunk of them. Less stress on the system. – Weather Vane Mar 30 '18 at 22:02
  • @WeatherVane Wouldn't that defeat the purpose of using a linked list in the first place, i.e., having a non-contiguous sequential container? – cadaniluk Mar 30 '18 at 22:03
  • Can `Insert` take another argument? (e.g. list tail)? – Craig Estey Mar 30 '18 at 22:05
  • "Can Insert take another argument? (e.g. list tail)? – Craig Estey " -> Yes, what are you thinking? Can you show me how? – NKF Mar 30 '18 at 22:07
  • @Downvoter: you don't use them as an array, but take the next element and use it as if you just allocated it. The usage is no different - but the *source* is elements of an array allocated in a block. It's quite easy to implement. A function like `*new_rec()` returns a pointer to the next element already in memory, unless there are no more, when a reallocation of another chunk is made. – Weather Vane Mar 30 '18 at 22:11
  • @WeatherVane I sure do understand how it works. If you implement the linked list by using an array, though, you lose all the linked list's properties such as smaller allocation size to prevent fragmenting or unbounded size (till you're out of memory). And how are you going to keep track of the array chunks it takes to store the elements? In a linked list? I don't mean to make a big thing out of this, it just strikes me as odd to implement it this way. – cadaniluk Mar 30 '18 at 22:17
  • 1
    @Downvoter no you don't understand. The array has nothing to do with the linked list. It is a pool of records, available without the expense of calling `malloc` for every one. And to take the idea further, they can be aranged as separate linked list, so you can pass an used record back to the pool. – Weather Vane Mar 30 '18 at 22:21
  • If list should be ordered only after 1 million inserts (and not in between) then it should be faster to just add items to the end of the list, and then sort the list. – Dialecticus Mar 30 '18 at 22:35
  • i dont see any c++ in your code, it looks like plain old c. You should retag it as such. In C++ one would use another approach: std containers and above all, not use malloc. – AndersK Mar 30 '18 at 22:36
  • @Downnvoter, arrays long term issues due memory fragmentation, big no no... Also not sure how allocating pools in arrays would help, perhaps you could show me in code – NKF Mar 30 '18 at 23:06
  • @AndersK, you can post a pure C++ with the new() operator, I don-t care, but will it run faster? – NKF Mar 30 '18 at 23:06
  • Which language, C or C++? They are distinct languages. The C++ language as `std::list` and you don't need to prefix variable declarations or return types with `struct`. – Thomas Matthews Mar 30 '18 at 23:36
  • The code is appending nodes to the end of a list, not inserting nodes into a list in some type of sorted order. Craig Estey's answer seems close. – rcgldr Mar 31 '18 at 01:26

3 Answers3

2

Linked lists are unwelcome on modern machines. The L1 and L2 caches love vectors and arrays. They utterly despise linked lists. Use a std::vector, not a linked list, (and certainly not a hand-rolled linked list). Try to .reserve() enough memory in the vector to accommodate all the new stuff. Just push everything onto the back of the vector, and then sort the vector using parallel execution. C++17 can do that painlessly. std::stable_sort parallels quite nicely.

Jive Dadson
  • 16,680
  • 9
  • 52
  • 65
  • Hi @Jive, Thank you for the suggestion, but the "thing here" is linked list. Since that you have compiler background you could show me a version with embedded assembly? – NKF Mar 31 '18 at 11:53
1

Caveat: This works only for tail insertion/append:

int
main()
{
    struct Node* nodeObj = NULL;
    struct Node* nodeTail = NULL;

    // your other stuff ...

    for (int i = 1; i < 1000000; i++) {
        nodeObj = Insert(nodeObj, &nodeTail, i);
        printf(" %d", i);
    }

    // your other stuff ...
}

struct Node* Insert(Node* _pToNode, Node **tail,int _nbr)
{
    Node *newValue = (struct Node*)malloc(sizeof(struct Node));
    newValue->data = _nbr;
    newValue->nextAddr = NULL;

    if (_pToNode == NULL) {
        _pToNode = newValue;
    }
    else {
        (*tail)->nextAddr = newValue;
    }

    *tail = newValue;

    return _pToNode;
}

You could clean this up with a "list" struct that has both the head and tail in it (vs. the separate args)


UPDATE:

Very cool/smart, but unfortunately is still slow...

malloc et. al. can be slow for a large number of small allocations. One way to speed things up is to use a subpool of allocations [as WeatherVane suggested].

As I mentioned, adding a "list" struct can make things tidier and I used it in two places. Once you commit to that, you can add other things besides head/tail, such as count. Also, it makes it easier to convert from a singly-linked list to a doubly-linked in the future, if you so choose.

Side note: With a doubly-linked list, insertions are [slightly] more complex, but insertions in the middle of the list are much faster because you don't have to traverse the list to find the previous pointer (e.g. the node would have a prev pointer in it). Also, RevPrintList would become as simple as PrintList.

Note that [on my system], the reverse print ran out of stack space [and segfaulted], so I recoded the print functions to not be recursive.

Here's a cleaned up version that should do the insertions faster because of the reduced number of individual malloc calls.

Side note: I didn't add the requisite checks for malloc, etc. returning null.

#include <stdio.h>
#include <stdlib.h>
//#include <conio.h>

typedef struct Node_ {
    int data;
    struct Node_* next;
} Node;

typedef struct List_ {
    int count;
    Node* head;
    Node* tail;
} List;

Node* NewNode(void);
Node* Insert(List* p, int data);
void Print(Node* p);
void PrintList(List *list);
void RevPrint(Node* p);
void RevPrintList(List *list);

List freelist;

int
main()
{
    List nodelist = { 0, NULL, NULL };

    printf("---------------------------------\n"
        "Insert()\n---------------------------------\n");

    for (int i = 1; i < 1000000; i++) {
        Insert(&nodelist, i);
#if 0
        printf(" %d", i);
#endif
    }

    printf("\n---------------------------------\n"
        "Print()\n---------------------------------\n");
#if 0
    Print(nodelist.head);
#else
    PrintList(&nodelist);
#endif

    printf("---------------------------------\n"
        "RevPrint()\n---------------------------------");
#if 0
    RevPrint(nodelist.head);
#else
    RevPrintList(&nodelist);
#endif

    printf("\nPress any key to continue...");
#if 0
    _getch();
#else
    getchar();
#endif
}

Node*
NewNode(void)
{
    Node *node;

    // NOTE: adjust the count setup (e.g. 1000) to what ever value you want
    if (freelist.count <= 0) {
        freelist.count = 1000;
        freelist.head = calloc(freelist.count,sizeof(Node));
    }

    node = freelist.head++;
    freelist.count -= 1;

    return node;
}

Node*
Insert(List* list,int _nbr)
{
    Node *node = NewNode();

    node->data = _nbr;
    node->next = NULL;

    if (list->head == NULL) {
        list->head = node;
    }
    else {
        list->tail->next = node;
    }

    list->tail = node;
    list->count += 1;

    return node;
}

void
Print(Node* p)
{

    if (p == NULL) {
        printf("\n");
        return;
    }

    printf(" %d", p->data);
    Print(p->next);
}

void
PrintList(List* list)
{
    Node *node;

    for (node = list->head;  node != NULL;  node = node->next)
        printf(" %d", node->data);

    printf("\n");
}

void
RevPrint(Node* p)
{

    if (p == NULL) {
        printf("\n");
        return;
    }

    RevPrint(p->next);
    printf(" %d", p->data);
}

void
RevPrintList(List *list)
{
    Node **rlist = malloc(sizeof(Node**) * list->count);
    Node *node;
    int ridx;

    ridx = list->count - 1;
    for (node = list->head;  node != NULL;  node = node->next, --ridx)
        rlist[ridx] = node;

    for (ridx = 0;  ridx < list->count;  ++ridx) {
        node = rlist[ridx];
        printf(" %d",node->data);
    }
    printf("\n");

    free(rlist);
}

UPDATE #2:

you could make freeList a list of lists (using a different "node" struct which would have a pointer to list instead of a number), so that memory could be released when the program is done.

Here is a modified version that does that:

#include <stdio.h>
#include <stdlib.h>
//#include <conio.h>

typedef struct Node_ {
    int data;
    struct Node_* next;
} Node;

typedef struct List_ {
    int count;
    Node* head;
    Node* tail;
} List;

typedef struct Freelist_ {
    int count;
    Node* head;
    Node* tail;
    Node* avail;
} FreeList;

Node* NewNode(void);
Node* Insert(List* p, int data);
void Print(Node* p);
void PrintList(List *list);
void RevPrint(Node* p);
void RevPrintList(List *list);
void FreeAll(void);

FreeList freelist = { 0 };

int
main()
{
    List nodelist = { 0, NULL, NULL };

    printf("---------------------------------\n"
        "Insert()\n---------------------------------\n");

    for (int i = 1; i < 1000000; i++) {
        Insert(&nodelist, i);
        // this printf will radically slow things down
#if 0
        printf(" %d", i);
#endif
    }

    printf("\n---------------------------------\n"
        "Print()\n---------------------------------\n");
#if 0
    Print(nodelist.head);
#else
    PrintList(&nodelist);
#endif

    printf("---------------------------------\n"
        "RevPrint()\n---------------------------------");
#if 0
    RevPrint(nodelist.head);
#else
    RevPrintList(&nodelist);
#endif

    // release all nodes back to the malloc free pool
    FreeAll();

    printf("\nPress any key to continue...");
#if 0
    _getch();
#else
    getchar();
#endif
}

Node*
NewNode(void)
{
    Node *node;

    // NOTE: adjust the count setup (e.g. 1000) to what ever value you want
    if (freelist.count <= 0) {
        freelist.count = 1000;
        node = calloc(freelist.count,sizeof(Node));

        // maintain linked list of nodes that are at the _start_ of a
        // malloc area/arena
        if (freelist.head == NULL)
            freelist.head = node;
        else
            freelist.tail->next = node;
        freelist.tail = node;

        // burn the first node as a placeholder
        freelist.avail = node + 1;
        freelist.count -= 1;
    }

    node = freelist.avail++;
    freelist.count -= 1;

    return node;
}

void
FreeAll(void)
{
    Node* node;
    Node* next;

    for (node = freelist.head;  node != NULL;  node = next) {
        next = node->next;
        free(node);
    }
}

Node*
Insert(List* list,int _nbr)
{
    Node *node = NewNode();

    node->data = _nbr;
    node->next = NULL;

    if (list->head == NULL) {
        list->head = node;
    }
    else {
        list->tail->next = node;
    }

    list->tail = node;
    list->count += 1;

    return node;
}

void
Print(Node* p)
{

    if (p == NULL) {
        printf("\n");
        return;
    }

    printf(" %d", p->data);
    Print(p->next);
}

void
PrintList(List* list)
{
    Node *node;

    for (node = list->head;  node != NULL;  node = node->next)
        printf(" %d", node->data);

    printf("\n");
}

void
RevPrint(Node* p)
{

    if (p == NULL) {
        printf("\n");
        return;
    }

    RevPrint(p->next);
    printf(" %d", p->data);
}

void
RevPrintList(List *list)
{
    Node **rlist = malloc(sizeof(Node**) * (list->count + 1));
    Node *node;
    int ridx;

    ridx = list->count - 1;
    for (node = list->head;  node != NULL;  node = node->next, --ridx)
        rlist[ridx] = node;

    for (ridx = 0;  ridx < list->count;  ++ridx) {
        node = rlist[ridx];
        printf(" %d",node->data);
    }
    printf("\n");

    free(rlist);
}

UPDATE #3:

Here's a version that adds reuse of nodes by adding a reuse member to FreeList and a FreeOne function that can be called when a node is deleted.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//#include <conio.h>

typedef struct Node_ {
    int data;
    struct Node_ *next;
} Node;

typedef struct List_ {
    int count;
    Node *head;
    Node *tail;
} List;

typedef struct Freelist_ {
    int count;
    Node *head;
    Node *tail;
    Node *avail;
    Node *reuse;
} FreeList;

Node *NewNode(void);
Node *Insert(List *p,int data);
void Print(Node *p);
void PrintList(List *list);
void RevPrint(Node *p);
void RevPrintList(List *list);
void FreeOne(Node *p);
void FreeAll(void);

FreeList freelist = { 0 };

int
main()
{
    List nodelist = { 0, NULL, NULL };

    printf("---------------------------------\n" "Insert()\n---------------------------------\n");

    for (int i = 1; i < 1000000; i++) {
        Insert(&nodelist,i);
        // this printf will radically slow things down
#if 0
        printf(" %d",i);
#endif
    }

    printf("\n---------------------------------\n" "Print()\n---------------------------------\n");
#if 0
    Print(nodelist.head);
#else
    PrintList(&nodelist);
#endif

    printf("---------------------------------\n" "RevPrint()\n---------------------------------");
#if 0
    RevPrint(nodelist.head);
#else
    RevPrintList(&nodelist);
#endif

    // release all nodes back to the malloc free pool
    FreeAll();

    printf("\nPress any key to continue...");
#if 0
    _getch();
#else
    getchar();
#endif
}

Node *
NewNode(void)
{
    FreeList *list;
    Node *node;

    list = &freelist;

    do {
        // try to reuse a node that has been released by FreeOne
        node = list->reuse;
        if (node != NULL) {
            list->reuse = node->next;
            node->next = NULL;
            break;
        }

        // NOTE: adjust the count setup (e.g. 1000) to what ever value you want
        if (list->count <= 0) {
            list->count = 1000;
            node = calloc(list->count,sizeof(Node));

            // maintain linked list of nodes that are at the _start_ of a
            // malloc area/arena
            if (list->head == NULL)
                list->head = node;
            else
                list->tail->next = node;
            list->tail = node;

            // burn the first node as a placeholder
            list->avail = node + 1;
            list->count -= 1;
        }

        // grab one from the current allocation array
        node = list->avail++;
        list->count -= 1;
    } while (0);

    return node;
}

void
FreeOne(Node *node)
{
    FreeList *list;

    list = &freelist;

    // push this node onto the front of the reuse list (i.e. it's fast)
    node->next = list->reuse;
    list->reuse = node;
}

void
FreeAll(void)
{
    Node *node;
    Node *next;

    for (node = freelist.head; node != NULL; node = next) {
        next = node->next;
        free(node);
    }

    memset(&freelist,0,sizeof(FreeList));
}

Node *
Insert(List *list,int _nbr)
{
    Node *node = NewNode();

    node->data = _nbr;
    node->next = NULL;

    if (list->head == NULL) {
        list->head = node;
    }
    else {
        list->tail->next = node;
    }

    list->tail = node;
    list->count += 1;

    return node;
}

void
Print(Node *p)
{

    if (p == NULL) {
        printf("\n");
        return;
    }

    printf(" %d",p->data);
    Print(p->next);
}

void
PrintList(List *list)
{
    Node *node;

    for (node = list->head; node != NULL; node = node->next)
        printf(" %d",node->data);

    printf("\n");
}

void
RevPrint(Node *p)
{

    if (p == NULL) {
        printf("\n");
        return;
    }

    RevPrint(p->next);
    printf(" %d",p->data);
}

void
RevPrintList(List *list)
{
    Node **rlist = malloc(sizeof(Node **) * (list->count + 1));
    Node *node;
    int ridx;

    ridx = list->count - 1;
    for (node = list->head; node != NULL; node = node->next, --ridx)
        rlist[ridx] = node;

    for (ridx = 0; ridx < list->count; ++ridx) {
        node = rlist[ridx];
        printf(" %d",node->data);
    }
    printf("\n");

    free(rlist);
}
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • Freelist is not initialized. In the case of some (most, all?) Microsoft compilers, global variables are not initialized to zero. – rcgldr Mar 31 '18 at 01:16
  • @rcgldr You [they!] gotta be kidding me ... On just about every other sane OS, uninitialized globals end up in the BSS [or equivalent] segment are inited to zero [by the OS/loader], sometimes by a many-to-one mapping to the zero page [that gets an on demand COW]. However, would `List freelist = { 0 };` be sufficient? – Craig Estey Mar 31 '18 at 01:29
  • 1
    @user6495763 - are you doing a "release" build, and assuming so, how long is the insert part taking? All of those calls to printf will be slow. – rcgldr Mar 31 '18 at 01:39
  • @rcgldr Yikes! So, if you init one element of the struct to zero, do the remaining ones get 0x00 or 0xCC? As much as I'd like to bash WinX, I'm sure any recent version has the same demand copy on write as linux/*BSD. Otherwise, they'd probably get hammered by [the negative press of] a bad benchmark. They're always claiming they have equal or better performance ;) – Craig Estey Mar 31 '18 at 01:40
  • @CraigEstey - take a look at [partial initialization of structure](https://stackoverflow.com/questions/10828294/c-and-c-partial-initialization-of-automatic-structure). As mentioned there, using static will set all members or elements to 0 or NULL, without the static, it's undefined. – rcgldr Mar 31 '18 at 01:44
  • @rcgldr I thought about doing that, using the `tail` pointer in `freelist`, but didn't for time and complexity reasons. I'll edit the post to include that for completeness. – Craig Estey Mar 31 '18 at 01:57
  • @Craig Estey, we are getting somewhere, regarding print func stack overflow no worries... I'm using a stopwatch to measure performance.... malloc vs new() I was getting 1 second of worst performance when I used new() each 100.000.000 new inserts. – NKF Mar 31 '18 at 11:05
  • @Craig Estey, i see that you switch to calloc in this version, and with calloc() I get a performance boost of 1 second each 100.000.000 (not setting the memory up before using it may be a problem, I'll check on that). Regarding to doubly-linked list, yes you re right the final idea is to use them exactly as you described. Can you make a version of this code using a subpool of allocations as suggested before? (small alocation to not run into memory fragmentation problems with arrays) – NKF Mar 31 '18 at 11:05
  • Forgot to say that all of this is being tested under release builds... :) – NKF Mar 31 '18 at 11:50
  • Ok, had to do a few adjustments regarding to buffering, but in the end @Craig Estey suggestions work very well, thank you Craig. – NKF Mar 31 '18 at 17:08
  • Craig Estey, you're a genius... thank you very much :) – NKF Apr 01 '18 at 14:58
0

I assume that the order of the elements that you mention is ordered by comparison. I.e. A < B => A is before B in the container.

If that assumption holds the best you can hope for is a log(N) insertion of each element. So if you want to insert an element among those millions of elements you should expect to do about 20+ comparisons. If the millions you want to insert have an order you can do better.

Now, a pointer based structure is never the answer to a fast data structure. std::vector should be your go to.

To read this fast-insertion structure with a for statement you'd need a very peculiar iterator or you need to rearrange the inserted stuff in to another vector.

In conclusion, you want to insert stuff into a simple binary heap.

Multithreading would not make this faster unless you can devise a merge like scheme to your inserts.

Captain Giraffe
  • 14,407
  • 6
  • 39
  • 67