-1

The goal of the following code segment is to take a sorted array of strings, and count how many of each word there is.

That information is then put into a struct called reduceNode that holds a string and a count for the given string.

The reduceNode structs are put into another array.

Once all of the words and their counts are found and put into the intermediate array, they are inserted into a global array of reduceNode structs.

This method is called by threads, which is why I am storing the results into a global array.

Anytime I run this part of the program, I am getting a segfault.

I assume I am accessing an array out of bounds, but I am having trouble narrowing down where I am doing so.

void* reduce(void* num) //Reduce function 
{
    int index = *(int*)num;
    int curSize = 0; //Size of the current linked list
    struct HashNode *head = HashTable[index]; //Get the head of the linked list from the hashtable
    struct HashNode *linkedList = head; //Pointer to the head to traverse the linked list
    while(linkedList != NULL) //Gets the size of the current linked list 
    {
        curSize++;
        linkedList = linkedList->next;
    }
    linkedList = head;
    int linkedListTraverse = 0; //Array index for each linked list node
    int numSort[curSize];
    char* wordSort[curSize];
    while(linkedList != NULL)
    {
        if(app == 1)
            numSort[linkedListTraverse] = linkedList->num; //Copy the data from the linked list into an array 
        else
        {
            wordSort[linkedListTraverse] = (char*) malloc(sizeof(linkedList->string));
            strcpy(wordSort[linkedListTraverse],linkedList->string); //Copy the data from the linked list into an array 
        }
        linkedList = linkedList->next;
        linkedListTraverse++;
    }
    if(app == 1)
    {
        qsort(numSort, curSize, sizeof(int), numCmpFunc); //Sort the current node
        int i, j = 0;
        reduceNode* numSortArray[curSize];
        reduceNode* curNum;
        for(i = 0; i < curSize; i++)
        {
            curNum = (reduceNode*) malloc(sizeof(reduceNode));
            curNum->num = numSort[i];
            numSortArray[i] = curNum;
        }
        i = 0;
        while(sortedArray[i] != NULL)
        {
            i++;
        }
        for(j = 0; j < curSize; j++, i++)
        {  
            sortedArray[i] = numSortArray[j];
        }
        return (void*) 0;
    }
    else
    {
        int i = 0;
        while(i < curSize) //Convert all of the words to lowercase
        {
            char* str = wordSort[i];
            char *p;
            for (p = str; *p != '\0'; p++)
                *p = (char)tolower(*p);
            i++;
        }
        qsort(wordSort, curSize, sizeof(char*), stringCmpFunc); //Sort the current node 
    }
    int curWordIndex = 0; //Exclusively for wordcount
    int checkWordIndex = 1;
    int curArrayIndex = 0;
    reduceNode *curWord;
    reduceNode* wordCountArray[curSize];
    while(curWordIndex < curSize)
    {
        curWord = malloc(sizeof(reduceNode));
        curWord->word = wordSort[curWordIndex]; //Set the word
        curWord->count = 1; //Start the count out at 1
        while(strcmp(wordSort[curWordIndex], wordSort[checkWordIndex]) == 0) //While the two words are equal
        {
            checkWordIndex++; //Advance the leading index check
            curWord->count++;
            if(checkWordIndex >= curSize) //If the leading index goes beyond the array bounds
            {
                break;
            }
        }
        if(checkWordIndex <= curSize)
        {
            curWordIndex = checkWordIndex;
            checkWordIndex = curWordIndex + 1;
        }
        else if(checkWordIndex >= curSize) //If the leading index goes beyond the array bounds
        {
            if(strcmp(curWord->word, wordSort[curWordIndex]) != 0)
            {
                curWord->word = wordSort[curWordIndex]; //Set the word
                curWord->count = 1; //Start the count out at 1
                curArrayIndex++;
                wordCountArray[curArrayIndex] = curWord;
            }
            else
            {
                wordCountArray[curArrayIndex] = curWord;
                curArrayIndex++;
            }
            break;
        }
        wordCountArray[curArrayIndex] = curWord;
        curWord = NULL;
        curArrayIndex++;
    }
    int i,j  = 0;
    while(sortedArray[i] != NULL)
    {       
        i++;
    }
    for(j = 0; j < curSize; j++, i++)
    {  
        sortedArray[i] = wordCountArray[j];
    }
    return (void*) 0;

}

reduceNode is defined as

typedef struct reduceNode
{
    int count;
    char *word;
    int num;
} reduceNode;

sortedArray is declared globally as

reduceNode **sortedArray;

and later initialized as

sortedArray = (reduceNode **)calloc(1,sizeof(reduceNode*)*inputCount);

Input count is the number of words that are read into the program

An example input would be an array: [alpha, alpha, bravo, charlie, charlie, charlie, delta]. The expected result would be [alpha 2, bravo 1, charlie 3, delta 1].

Colin Null
  • 111
  • 3
  • 8
  • 5
    Can you please create a [MCVE](https://stackoverflow.com/help/mcve)? You may also be interested in https://stackoverflow.com/questions/25385173/what-is-a-debugger-and-how-can-it-help-me-diagnose-problems – hellow Oct 09 '18 at 06:30
  • Some remarks: `reduceNode* wordCountArray[curSize]:` is not ISO-C, but a gcc extension. `curWord = (reduceNode*) malloc(sizeof(reduceNode));` no casting needed here. I suspect, that you access out of the bounds of `wordCountArray` and produce the segfault. Why do you return a null pointer? Where is the function signature?! – hellow Oct 09 '18 at 06:34
  • Is this actually a thread *proc* ?? as in, a `void * thread_proc(void*)` for a pthread pool ? – WhozCraig Oct 09 '18 at 06:39
  • @hellow I removed that cast, thank you for the information. I returned a null since this process is called by threads. – Colin Null Oct 09 '18 at 06:42
  • 1
    @hellow C99 has VLAs – Support Ukraine Oct 09 '18 at 06:43
  • @WhozCraig Yes, this process is only called by threads. – Colin Null Oct 09 '18 at 06:44
  • @4386427 indeed, thanks for that! :) – hellow Oct 09 '18 at 06:45
  • 1
    Please post a complete program with structure, object and function definitions. From this fragment, we can only guess how `wordCountArray`, sortedArray`, `wordSort` and `reduceNode` are defined and it matters to analyse the code properly. – chqrlie Oct 09 '18 at 06:51
  • Is it called by *concurrent* threads? Because if so there is no concurrency control on `sortedArray[i]` at the end of this thing *at all*. – WhozCraig Oct 09 '18 at 06:57
  • @chqrlie I modified my post to include the entire function, as well as how sortedArray is defined. Sorry for the lack of detail. – Colin Null Oct 09 '18 at 06:59
  • @WhozCraig The threads are executed one after the other, not in parallel. – Colin Null Oct 09 '18 at 07:02
  • 1
    Still waiting for a [mcve]. In the mean time, dare I ask what `HashNode` looks like? It, like the mcve, belongs in your question. – WhozCraig Oct 09 '18 at 07:20
  • @WhozCraig I am not trying to argue with you, but I believe the issue within my code lies within what I have posted here. I do not want to post the entirety of my code since there is a lot of it, and the majority of it is functioning as expected. The threading and use of the hashnodes works for when I have to sort a list of numbers. The issue appears when I try to do my word counting part of this method, I am assuming due to an index out of bounds. I am unsure of what I am missing from my MVCE. I have the faulty code segment, the issue I am having. Is the MVCE supposed to be executable code? – Colin Null Oct 09 '18 at 07:34
  • @WhozCraig I am sorry if I come across as argumentive or ungrateful for you trying to offer help, that is not my intention. I am just genuinely unsure of what I am missing in my question. – Colin Null Oct 09 '18 at 07:35
  • 1
    An MCVE, by definition is something that we, armed with a *blank* source file and a compiler toolchain, can copy/paste/compile, and *run*, providing it whatever input (which you also provide) is required to produce the same problem you're having. Without that, we can pour over code for hours, and what you get is answers that are half guesses and speculation, and a stack fo comments pages long. It does NOT mean post your whole program. It mean trim it down to as small a size as possible, while still producing the problem. The smaller the better, so long as it is *complete* and reproduces. – WhozCraig Oct 09 '18 at 07:40
  • Ex: How do we know `HashNode::string` isn't a pointer vs. a fixed length native array ? If it were the former, certainly `malloc(sizeof(linkedList->string))` would be *wrong*. But we don't know; `HashNode` isn't in your question. it *would be* if this were an mcve we could build and run to your problem Comments, one at a time, back and forth, aren't necessary if we have a *real* mcve that produces the issue. – WhozCraig Oct 09 '18 at 07:46
  • Thank you, I understand much better what a proper MVCE is now. I will modify my post to reflect your suggestions and make something that can be ran independently with my provided input. – Colin Null Oct 09 '18 at 07:50
  • @ColinNull: you may well find a solution to your problem by doing just that: trimming down the program to include only the minimum code that produces the problem. Very often the problem becomes obvious and is easily fixed. It is also common to find out that the problem is in code that was assumed to *work* but had undefined behavior causing bugs in unrelated parts of the program. – chqrlie Oct 09 '18 at 19:09

1 Answers1

1

1. You checkWordIndex reaches exactly curSize and strcmp(wordSort[curWordIndex], wordSort[checkWordIndex] will go out of bounds. I recomment printing the indicies for debugging.

if(checkWordIndex < curSize)
{
    curWordIndex = checkWordIndex;
    checkWordIndex = curWordIndex + 1;
}

this code will still lead to checkWordIndex == curSize

2. You allocate new memory, remember to free it.

3. For thread safety lookup mutex in C.

I recommend using only one indicie and iterating like

while(index < cursize-1)
{
    ...
    ++index;
}

your fist indicie is index and your second is index+1.

M.Winkens
  • 140
  • 2
  • 11
  • I modified that loop to be while(checkWordIndex <= curSize) and I am still experiencing a segfault. The suggestion to use only one indice is also a good one and I don't know why I did not think of it. – Colin Null Oct 09 '18 at 07:08
  • check before the strcmp line, if the checkWordIndex >= curSize. I guess it still goes out of bounds with your updated code – M.Winkens Oct 09 '18 at 07:13