0

I'm trying to use C multithreading to find out the frequency of each alphabet letter in a text file. Assignment is to: 1) write a function that read every single sentence in a text, ended by '.' 2) write a function that load a sentence in a bidimensional array 3) write a function that generates a pthread for every letter for every sentence (pthread function add 1 to a counter for that letter). EDIT: I figured out with Valgrind that the problem is in sentence function, by I dont understand why.

Here's the code:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>

char alphabet[26] = "abcdefghijklmnopqrstuvwxyz";
int count[26];

char* sentence(char * s){
    char* p;
    char* q;
    char* arr;
    int i;
    p = s;
    q = malloc(100);
    arr = q;
    for (i=0; *p != '.'; i++){ 
        *q = *p;
        q++;
        p++; 
    }
    *q = '\0';
    return arr;
}

char** load_sentence(char* p, char** q, int i){
    q[i] = malloc(strlen(p)+1);
    strcpy(q[i], p);
    return q;
}

void* count_letter(void * s){
    char* p = (char*) s;
    int i;
    for (i=0; i<26; i++){
        if (*p == alphabet[i]){
            count[i]++;
        }
    }
}

void frequency(char* str){
    char* s = str;
    int i, j, l;
    l = strlen(str);
    pthread_t tid[l];
    for (i=0; i<l; i++){
        pthread_create(&tid[i], NULL, count_letter, (void*) s);
        s++;
    }
    for (j=0; j<l; j++){
        pthread_join(tid[j], NULL);
    }
}


int main(int argc, char* argv[]){

    int fd;
    char buff[100];
    fd = open(argv[1], O_RDONLY);
    char ** text = malloc(10*sizeof(char*));
    read(fd, buff, sizeof(buff));
    char* start = buff;
    int i = 0; //number of phrases!
    char* p = NULL;

    while (*(p = sentence(start)) != '\0'){
        text = load_sentence(p, text, i);
        start += strlen(p)+1;
        i++;
   }

   int j, k;

   for (k=0; k<i; k++){
        frequency(text[k]);
   }

   for (j=0; j<26; j++){
        printf("%c : %d times\n", alphabet[j], count[j]);
   }
}

It looks like that with cases like this: hope it's a good reading. bye. The output is correct:

a : 2 times
b : 1 times
c : 0 times
d : 2 times
e : 3 times
f : 0 times
g : 3 times
h : 1 times
i : 2 times
j : 0 times
k : 0 times
l : 0 times
m : 0 times
n : 1 times
o : 3 times 
p : 1 times
q : 0 times
r : 1 times
s : 1 times
t : 1 times
u : 0 times
v : 0 times
w : 0 times
x : 0 times
y : 1 times
z : 0 times

With others, a "memory error", that begins with free() : invalid next size (normal). The error has many lines of memory map and ends with abortion.

I'm quite new to C, sorry for my inexperience.

Is it necessary to introduce a mutex in this case?

erika
  • 15
  • 6
  • 1
    C allows more than one character for variable names. Use them. And if you are not forced to use C89 but C99 or higher declare and define variables where they are used. Next familiarize yourself with your favourite debugger. Your `load_sentence()` does not do what its name suggests. – Swordfish Oct 17 '18 at 13:05
  • It is running for me in ideone: https://ideone.com/QVOALf with your input `aaaaaaaaa.aaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaa.` – izlin Oct 17 '18 at 13:09
  • 1
    you are corrupting memory. your `text` has place for only 10 pointers. though your `load_sentence` can access as many as the sentence lenght is, more than 10 for `aaaa...` – Serge Oct 17 '18 at 13:13
  • I know C allows more than one character for variable names, I'm sorry it's not that readable this way. But it's not the problem, right? Why do you say that function `load_sentence()`does not do what its name suggests? It indeed copies a sentence, pointed by pointer `p`, in a bidimensional array `q` at position `i`. – erika Oct 17 '18 at 13:16
  • Serge, can you please explain better what you're trying to say? I know there are only 10 pointers, but they are, pointers to pointers, it means that in these way this code can read only 10 sentences... – erika Oct 17 '18 at 13:25
  • You need to null terminate them there strings. `char alphabet[26+1] = "abc...` – Lundin Oct 17 '18 at 13:28
  • As in, you ran into this little pitfall: https://stackoverflow.com/a/52385480/584518 – Lundin Oct 17 '18 at 13:30
  • Thank you for your comment Lundin, but it seems that modifying that doesn't solve the problem... – erika Oct 17 '18 at 13:39
  • I guess it's a school project to learn threaded programming? if not.. remove the mutex and threading it will not speed up the process as the code it's written... Also why do you read sentence by sentence when you present the data flat anyway? – Anders Cedronius Oct 17 '18 at 14:01
  • Yeah, school request... Our professor wasn't indeed that helpful, unfortunately. The request is to read the text this way, sentence by sentence. – erika Oct 17 '18 at 14:06

2 Answers2

0

Erika,

Since I don't really know your assignment please see this as just another way out of a 1000 to count characters. I have not checked it for bugs, rewrite to your needs. Anyhow this is how I would have solved it. If memory is sparse I would read character by character from the file until ".". Anyhow hope it helps you and you get great grades :-)...

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <stdatomic.h>

#define MAX_THREADS 100
atomic_int threadCount;
#define NCHAR 26
char alphabet[NCHAR] = "abcdefghijklmnopqrstuvwxyz";
atomic_int count[NCHAR];


void* count_letter(void * s){
    threadCount++;
    char* p = (char*) s;
        for (int i=0; i<NCHAR; i++)
            if (*p == alphabet[i])
                count[i]++;
    threadCount--;
    return NULL;
}

int main(int argc, char* argv[]){

    //Init variables
    FILE *file;
    char *myText;
    unsigned long fileLen;
    int deadLockGuard=0;
    threadCount=0;

    //Open the file
    file = fopen(argv[1], "rb");
    if (!file) {
        fprintf(stderr, "Unable to open file %s", argv[1]);
        return EXIT_FAILURE;
    }
    fseek(file, 0, SEEK_END);
    fileLen=ftell(file);
    rewind(file);

    //reserve memory and read the file
    myText=(char *)malloc(fileLen+1);
    if (!myText) {
        fprintf(stderr, "Memory error!");
        fclose(file);
        return EXIT_FAILURE;
    }
    fread(myText, fileLen, 1, file);
    fclose(file);

    //Get each sentence ending with a . and then for each character look at the count for each character in it's own thread.
    char *subString = strtok(myText, "."); //This is your sentence/load_sentence method
    while (subString != NULL) {
        for (int v = 0;v<strlen(subString);v++) { //This is your frequency method
        deadLockGuard=0;
        while (threadCount >= MAX_THREADS) {
            usleep(100); //Sleep 0.1ms
            if(deadLockGuard++ == 10000) {
                printf("Dead-lock guard1 triggered.. Call Bill Gates for help!"); //No free threads after a second.. Either the computer is DEAD SLOW or we got some creepy crawler in da house.
                return EXIT_FAILURE;
            }
        }

        pthread_t tid; //Yes you can overwrite it.. I use a counter to join the workers.
        pthread_create(&tid, NULL, count_letter, (void*) subString+v);
    }
        subString = strtok(NULL, ".");
    }
    deadLockGuard=0;
    //pthread_join all the still woring threads
    while (threadCount) {
        usleep(1000); //sleep a milli
        if(deadLockGuard++ == 2*1000) {
            printf("Dead-lock guard2 triggered.. Call Bill Gates for help!"); //Threads are running after 2 seconds.. Exit!!
            return EXIT_FAILURE;
        }
    }
    //Garbage collect and print the results.
    free(myText);
    for (int j=0; j<NCHAR; j++)
        printf("%c : %d times\n", alphabet[j], count[j]);
    return EXIT_SUCCESS;
}
Anders Cedronius
  • 2,036
  • 1
  • 23
  • 29
  • Anders Cedronius, thank you for your answer at this problem, it was useful. Unfortunately, the assignment requires those passages, that's why I introduce those functions ;( Anyway, thanks! :) – erika Oct 17 '18 at 18:41
0

Your previous version with mutex had undefined behaviour because you initialized mutex multiple times, according to reference:

Attempting to initialize an already initialized mutex results in undefined behavior.

You are accesing count concurrently, so you have to use mutex to make thread-safe code. You called pthread_mutex_init in count_letter it is incorrect, this function is the body of your thread (multiple initialization of mutex without destroying it leads to UB), you should call pthread_mutex_init only once, for instance as first line in main function:

int main() {
 pthread_mutex_init(&mtx,NULL);

before return add

 pthread_mutex_destroy(&mtx);

Critical section in your count_letter function is line

count[i]++;

you should modify it as follows

pthread_mutex_lock(&mtx);
count[i]++;
pthread_mutex_unlock(&mtx);

Now, return to sentence implementation, you need to check if *p doesn't point to null terminator before comparing with .:

for (i=0; *p && *p != '.'; i++){ 
          ^^ added

without testing it, \0 != . returns true and your loop continues ...

rafix07
  • 20,001
  • 3
  • 20
  • 33
  • Thank you rafix07 for the explanation! I really had a hard time to figure out. Memory management with C it's not that easy...! I've added that code and now it works fine. Thanks :) – erika Oct 17 '18 at 21:18