-1

I want my program to read N words from a text file and save them in an array. My question is, do i need a 2D Array e.g: char **wordList or is the 1D Array in the example below sufficient? The output is correct except from the last string which as you can see is weird. Also, am i allocating sufficient memory for the array and why does the last output string come out wrong?

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void populateWordsArray(int);

FILE *file;
char *word;
char **wordList;

/*
* Function populateWordsArray: reads N words from
* the given file and creates an array with them.
* Has one argument: int N - the number of words to read.
*/
void populateWordsArray(int N) {
    int i = 0;
    while(!feof(file) && i < N) {
    fscanf(file,"%s",&word[i]);
    printf("%s\n",&word[i]);
    i++;
    }
}

int main(int argc,char *argv[]) { // argv[1] = op argv[2] = name
    int N = 0;

    file = fopen(argv[2],"r");

    if(file == (FILE *) NULL) { // check if the file opened successfully
        fprintf(stderr,"Cannot open file\n");
    }

    fscanf(file,"%d",&N); // get the N number
    word = malloc(N * sizeof(char));

    populateWordsArray(N);
    // write a switch method for the various ops
    // call the appropriate function for each operation

    free(word);
    fclose(file);
    return 0;
}

Output:

this
is
a
test!
with
files.
new
line,
here.
ere.

text file content:

10 this is a test! with files.
new line, here.
Stelios Papamichail
  • 955
  • 2
  • 19
  • 57
  • 1
    Possible duplicate of [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – Quentin Jan 18 '19 at 16:04
  • @Quentin I don't really believe that this question is about feof. – nm_tp Jan 18 '19 at 16:05
  • @nm_tp the last line bug sure is. – Quentin Jan 18 '19 at 16:07
  • 1
    Actually, true. And this is what made him post the question. But this code needs much more correcting than that. – nm_tp Jan 18 '19 at 16:10
  • @nm_tp no it's not about eof (idk if the last output is connected to eof though) – Stelios Papamichail Jan 18 '19 at 16:11
  • Your code does not currently seem to use `wordList` - it only declares it. And the loop that calls `fscanf` to read the words is incrementing the start position within a buffer (`word`) of `N` `char` by 1 each time. That doesn't look right! – Ian Abbott Jan 18 '19 at 16:12
  • @Ian Abott yeah i wasn't sure if I'll need to use that or just 'word'. Also do you mind explaining the last part of your comment a bit? – Stelios Papamichail Jan 18 '19 at 16:16
  • 1
    `char **` is *not* a 2d array. It is not an array at all. – Antti Haapala -- Слава Україні Jan 18 '19 at 16:16
  • I mean that every time around the loop, you only keep the first character from the previously read string. So if you read the words "one two Three four Five" then the 1-dimensional `words` buffer will end up containing the string "otTfFive". – Ian Abbott Jan 18 '19 at 16:21
  • @IanAbbott ok now i get it but then why does the `word[0]` for example output the string `this` instead of just `t`? Also, how would i got about correcting my program? I thought that using a `char **` was equal to creating a 2D array dynamically, how should i go about doing that? PS: Sorry for the multiple questions but i'm new to C – Stelios Papamichail Jan 18 '19 at 16:28
  • You want to memorize a list of strings. You can use a 2D array, so a char [][50] for instance supposing a word cannot be longer that 49 characters more the final null. You can also use a vector of char *, that solution takes less memory in case the read words are smaller that 40 characters, and also allows longer words – bruno Jan 18 '19 at 16:35

1 Answers1

2

Your example is wrong. When executing the line fscanf(file,"%s",&word[i]);, the third argument should be the address where the function will write the read data. In your case, word[i] is the i-th element of the array and &word[i] is its address. So, the word will be stored with the first character at the word[i]. Your code only prints something because you print it immediately. Also, you don't get a segfault by pure chance. If you want to read a string into a buffer, you first need to allocate the space for the buffer. By using char **, you can make it into a 2D array by first allocating sufficient space for the array of pointers and then allocate sufficient space for each of the pointers to hold an address to a string.

I have rewritten your program for you:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define MAX_STRING_LENGTH 100

void populateWordsArray(int);

FILE *file;
char **wordList;

void populateWordsArray(int N) 
{
    int i = 0;
    while(i < N && fscanf(file,"%s", wordList[i]) == 1) // fscanf returns the number of successfully read items. If it's not 1, the read failed. You can also check if the return value is not EOF but, in this situation, it's the same.
    {
        printf("%s\n", wordList[i]); // i-th element is an address to a buffer that contains 100 bytes
        i++;
    }
}

int main(int argc,char *argv[]) 
{ 
    int N = 0, i;

    file = fopen(argv[1],"r"); // Indexing starts from 0 in C. Thus, 0th argument is the executable name and 1st is what you want.

    if(file == NULL)  // No need to cast NULL into a specific type.
    {
        fprintf(stderr,"Cannot open file\n");
        return 1; // You might want to end the program here, possibly with non-zero return value.
    }

    fscanf(file,"%d",&N);
    wordList = malloc(N * sizeof(char*)); // Allocating space for pointers
    for(i=0; i<N; i++)
    {
        wordList[i] = malloc(MAX_STRING_LENGTH); // Allocating space for individual strings 
    }

    populateWordsArray(N);

    for(i=0; i<N; i++)
    {
        free(wordList[i]);
    }
    free(wordList);
    fclose(file);
    return 0;
}

I'd also advise against using global variables here.

EDIT: As the comments may suggest, this code is not the best solution. First, all the words might not fit into a 100 byte buffer. To alleviate this issue, allocate a large, fixed-size buffer, read every word into it, then allocate corresponding number of bytes for wordList[i] (don't forget the terminating null byte) and copy the data from the fixed-size buffer into wordList[i].

Also, the code has some missing error checks. For instance, the file may exist but is empty, in which case fscanf(file,"%d",&N); will return EOF. Also, the number at the beginning of the file may not be corresponding to the number of the lines that follow or N might be a negative number (the code allows for it by specifying it to be int).

EDIT2: As @bruno suggested, I made a version that I think is more bulletproof than the previous one. It's possible that I omitted something, I'm in a bit of a hurry. If so, let me know below.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define MAX_STRING_LENGTH 512


char line_buffer[MAX_STRING_LENGTH]; // A line of the maximum size that can occur.

char** populateWordsArray(unsigned wantedLines, FILE* file, unsigned *readLines);


char** populateWordsArray(unsigned wantedLines, FILE* file, unsigned *readLines)
{
    *readLines=0;
    char** wordList;
    // Allocating space for pointers
    wordList = malloc(wantedLines * sizeof(char*));
    if(!wordList)
    {
        fprintf(stderr,"Cannot allocate sufficient space for the pointers.\n");
        exit(EXIT_FAILURE); // You may return NULL here and check it afterwards. The same goes for all the error checking inside this function
    }

    while(*readLines < wantedLines && fscanf(file,"%s", line_buffer) == 1)
    {
        wordList[*readLines] = malloc(strlen(line_buffer)+1);
        if(!wordList[*readLines])
            break;
        if(NULL == (wordList[*readLines]=strdup(line_buffer)))
            break;
        (*readLines)++;
    }

    return wordList;
}

int main(int argc,char *argv[]) 
{ 
    unsigned N = 0, i, M;
    char **wordList;
    FILE *file;


    file = fopen(argv[1],"r"); // Indexing starts from 0 in C. Thus, 0th argument is the executable name and 1st is what you want.

    if(file == NULL)  // No need to cast NULL into a specific type.
    {
        fprintf(stderr,"Cannot open file\n");
        return 1; // You might want to end the program here, possibly with non-zero return value.
    }

    if(fscanf(file,"%d",&N) != 1)
    {
        fprintf(stderr,"Cannot read the number of lines. Empty file?\n");
        return 1;
    }


    wordList = populateWordsArray(N, file, &M);

    printf("Printing the read lines:\n");
    for(i=0; i<M; i++)
    {
        printf("%s\n", wordList[i]);
    }

    for(i=0; i<M; i++)
    {
        free(wordList[i]);
    }
    free(wordList);
    fclose(file);
    return 0;
}
nm_tp
  • 463
  • 3
  • 15
  • first of all, thanks for the answer i'll write it the same way and give it a try. Secondly, why do we specifically define the max size of each string to 100 bytes? Is that enough ? Would it be wrong to do `malloc(max_string_length * sizeof(char))` ? – Stelios Papamichail Jan 18 '19 at 16:39
  • 1
    it is better to not allocate fixed size string into _main_ and in _populateWordsArray_ to read the string in a local var then duplicate it with _strdup_ to place it in the vector – bruno Jan 18 '19 at 16:40
  • @SteliosPapamichail `sizeof(char)` is equal to 1, by standard. – nm_tp Jan 18 '19 at 16:48
  • @bruno I agree but this answer's purpose is to restructure his code to be correct, not best possible. It still checks. The words might be longer than 100 bytes, etc. – nm_tp Jan 18 '19 at 16:50
  • 2
    @nm_tp the fact the word can be longer than 100 is a reason to not pre allocate ;-) May be you can add a second implementation in your answer explaining why it is a better one ? Just my 2 cents – bruno Jan 18 '19 at 16:53
  • My professor said that on each row in the text file there can be a maximum of 512 chars. So just to cover the worst case scenario (e.g: one word spanning the whole line), should i do `malloc(512 * sizeof(char))`. Sorry for writing `sizeof(char)` again but the prof. wants it to be in there.. – Stelios Papamichail Jan 18 '19 at 16:55
  • Okay, write it wherever you allocate `char`s but you should bear in mind that this is redundant. I added a better solution, I believe. – nm_tp Jan 18 '19 at 17:15
  • @bruno I added a better solution. The length restriction of `512` can be changed but I chose it because OP mentioned it. – nm_tp Jan 18 '19 at 17:16
  • You will say I am never happy ^^ but I think the `malloc(xx * sizeof(char*));` is better placed into the main, that allows to return the number of read lines in place of the vector and without having to use a "out param". It is useless to declare _populateWordsArray_ because its definition is just under the declaration – bruno Jan 18 '19 at 17:22
  • I think it's okay to delegate the allocation to one function. It's one line and the function does not depend on pre-allocating the array. I'd just change the name to be more indicative of the true job that the functuon performs. After all, it not does more than just populate the array. We can also realloc if M differs from N. Also, we're now just speculating since OP might really need something else for his assignment. – nm_tp Jan 19 '19 at 10:41
  • @SteliosPapamichail Is it possible that your professor told you to use `sizeof(char*)` instead of `sizeof(char)`? – nm_tp Jan 21 '19 at 07:29
  • @nm_tp i meant that he just wanted `sizeof()` in there, i'm guessing that he just wants to see if we know how to use it. As for the edited answer, we do know that N will always be `N > 0` and also what do you mean by `Also, the number at the beginning of the file may not be corresponding to the number of the lines that follow` ? We have to use N as an indicator of the number of words that we have to read from the given file. Are you talking about the case of N being greater than the words in the file? PS: Haven't read the whole edited answer yet, will read it today. – Stelios Papamichail Jan 21 '19 at 11:33
  • @SteliosPapamichail If you **know** that N will be positive, then you can use `unsigned` without any worries. There is always a possibility that the first line says, for example 13 and there are 5 words afterwards. That's why, in the second example, we "return" (using a pointer) the number of words read. – nm_tp Jan 21 '19 at 12:12
  • @nm_tp i just talked with the prof. and he said that we do know that N >= 0 always so yeah i should make that an `unsigned int` and that also `N = the number of words in the file always`. – Stelios Papamichail Jan 21 '19 at 14:31
  • That's okay, both versions of code run correctly in the ideal case. :) – nm_tp Jan 21 '19 at 14:53