0

This program takes multiple file names as command-line arguments and finds out the number of words in each file and how many times each word appears (i.e., the frequency) in all files. Specifically, the program will first determine the number of files to be processed. Then, the program will create multiple threads (one for each file). Each thread will count the number of words for the given file. In addition, each thread will access a global linked-list and update the number of times each word appears in all files.

However, I cannot print the word in each node. When I tried:

printf("%s appears %d times\n", node->word, node->count);

I got a segmentation fault.

Thread 1: number of words in File:input_file_1.txt is 6
Thread 2: number of words in File:input_file_2.txt is 14
All 2 files have been counted and the total of 20 words found !
Segmentation fault: 11

There's something wrong when I push nodes in linked list or print the linked list but I couldn't figure it out. Here's my code:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <pthread.h>

struct thread_info {
   pthread_t thread_id; /* ID returned by pthread_create() */
   int thread_num; /* Application-defined thread # */
   char * filename; /* From command-line argument filename */
};

// A linked list node
struct Node{
   char * word;
   int count;
   struct Node * next;
};

struct Node * head = NULL;
static int totalWordCount = 0;

void push(struct Node **head, char * new_data){
    struct Node * new_node = (struct Node * ) malloc(sizeof(struct Node));
    struct Node *last = *head;  

    new_node->word = new_data;
    new_node->count = 1;
    new_node->next = NULL; 

    if (*head == NULL) { 
       *head = new_node; 
       return; 
    }   

    while (last->next != NULL) 
        last = last->next; 

    last->next = new_node; 
    return;     
}

bool search(struct Node **head, char * x){
   struct Node * current = *head; // Initialize current
   while (current != NULL){
      if (strcmp(current, x) == 0){
         current->count++;
         return true;
      }
      current = current->next;
   }
   return false;
}


// This function prints contents of linked list starting from head

void printList(struct Node *head){
   struct Node *node = head;
   while (node != NULL){
        printf("%s appears %d times\n", node->word, node->count);
        node = node->next;
    }
}

void * processFile(void * vargp){
   int numberofwords = 0;
   // Store the value argument passed to this thread
   struct thread_info * tinfo = vargp;
   FILE * fp;
   fp = fopen(tinfo->filename, "r"); // read mode
   if (fp == NULL){
      perror("Error while opening the file.\n");
      exit(EXIT_FAILURE);
   }
   char word[100]; 
   while (fscanf(fp, "%s", word) != EOF) {
       if (search(&head,word)){
        } else{
            push(&head, word);
        }
       numberofwords+=1;
   }
   printf("Thread %d: number of words in File:%s is %d\n", tinfo->thread_num, tinfo->filename, numberofwords);
   totalWordCount += numberofwords;
   fclose(fp);
}

int main(int argc, char const * argv[]){
   pthread_t thread_id;
   char ch, file_name[25];
   int numberoffile = argc-1;
   for (size_t i = 0; i < numberoffile; i++){
      struct thread_info tinfo;
      tinfo.thread_num = i + 1;
      tinfo.filename = argv[i + 1];
      pthread_create( & tinfo.thread_id, NULL, processFile, & tinfo);
      pthread_join(tinfo.thread_id, NULL);
   }
   printf("All %d files have been counted and the total of %d words found !\n", argc - 1, totalWordCount);
   printList(head);
   //printf("%s appears %d times\n", head->word, head->count);
   return 0;
}

Thank you so much!

Estaa
  • 11
  • 2
  • didn't go through everything, the first erroneous thing that draws attention: this `struct Node *last = *head;` should be replaced by this: `struct Node *last = head;`. also this `*head = new_node;` with this `head = new_node;`. anyway you need to seriously revise your code and come back again – mangusta Jul 27 '20 at 07:55
  • You are pushing a locally stack-allocated string into your list in processFile(): push(&head, word); – Federico Jul 27 '20 at 07:56
  • @mangusta I would say that `struct Node *last = *head;` **is correct**. Notice that `head` is passed as a double-pointer – Support Ukraine Jul 27 '20 at 08:06
  • In the `search()` function, you do a string comparison between a pointer to a string and a pointer to a struct. Where you say `strcmp(current, x)`, you probably mean `strcmp(current->word, x)`. Reading a string that is not really a string could easily cause a segmentation fault. – Howlium Jul 27 '20 at 08:13

1 Answers1

2

There are several problems in your code. The problem that causes the seg fault is most likely that you save (aka push) a pointer to a local variable to the linked list.

In processFile function:

   char word[100];   <---------------------- local variable
   while (fscanf(fp, "%s", word) != EOF) {
       if (search(&head,word)){
        } else{
            push(&head, word);  <----------- call push

In push function:

void push(struct Node **head, char * new_data){
    struct Node * new_node = (struct Node * ) malloc(sizeof(struct Node));
    struct Node *last = *head;  

    new_node->word = new_data;  <------- save pointer

A local variable like char word[100]; has automatic storage duration. This means that the variable only exists while you are inside that function (or inside functions called from that function). But once processFile returns, the variable word is automatically destroyed and it's not valid to access the memory anymore. In other words - your list contains pointers to invalid memory.

What you need to do is to save the word in some other memory.

For that you have at least two options.

Option 1: Change the node definition and use strcpy. Like:

struct Node{
   char word[100];
   int count;
   struct Node * next;
};

and inside push do:

assert(strlen(new_data) < 100);
strcpy(new_node->word, new_data);

Option 2: Use dynamic allocation for the words in the linked list. Like:

void push(struct Node **head, char * new_data){
    struct Node * new_node = malloc(sizeof(struct Node));
    struct Node *last = *head;  

    new_node->word = malloc(1 + strlen(new_data));
    assert(new_node != NULL);
    strcpy(new_node->word, new_data);

But there are more problems with your code. For instance:

strcmp(current, x) ---> strcmp(current->word, x)
       ^^^^^^^                 ^^^^^^^^^^^^^
    not a string               this is the string to compare

And here:

  pthread_create( & tinfo.thread_id, NULL, processFile, & tinfo);
  pthread_join(tinfo.thread_id, NULL);

you create a thread and then immediately joins it. That actually means that you only have one thread running at the time!

The code should have one loop for creating threads and another loop for joining them.

However, if you create/join correctly the multi-thread implementation will fail big time because all threads would read and modify the linked list at the same time.

You would need to add mutex protection so that only one thread can operate on the list. But if you do that, the threads will block each other all the time so there would be no real benefit from having multiple threads.

What this really means is that your overall design is problematic. It's not a good idea to have multiple threads using the same global linked list.

If you want to do this multi-threaded, each thread should have it's own list while adding words from the file. Then you would need some code to merge the lists once the file reading completes. I'm not sure it's worth the trouble but you can give it a try.

Finally, I don't think a linked list is a good data container for this application. The search will be slow (i.e. O(N)) and consequently performance be bad for large files (aka many words). Consider using a search-tree or some hash based table instead.

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63