Read a text file into a 2D array in C

Question

I'm trying to read an entire text file into a 2D array, so I can limit how much it can be stored and to know when to do a new line (if anyone has a better idea, I'm open to suggestions).

This is what I have so far:

int main(int argc, char** argv) {

    char texto[15][45];
    char ch;
    int count = 0;
    FILE *f = fopen("texto.txt", "r");

    if(f == NULL)
        printf("ERRO ao abrir o ficheiro para leitura");

    while((ch = fgetc(f) != EOF))
        count++;

    rewind(f);

    int tamanho = count;

    texto = malloc(tamanho *sizeof(char));

    fscanf(f, "%s", texto);

    fclose(f);

    printf("%s", texto);

    return (EXIT_SUCCESS);
}

And the text file is like this

lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip

But I get this error

error: assignment to expression with array type

here

texto = malloc(tamanho *sizeof(char));

*I get an error* is not useful unless you tell us what that error is - we can't see your screen from here. Errors come with error messages that give you information, and it's on the screen right in front of you. There is no reason for you to not include it in your question so that we have that infromation to use to try to help you. Please [edit] your post to include the **exact** error message. Thanks. — Ken White, Oct 20 '18 at 00:43
Read your own code. You've declared `char texto[15][45];`. That means it's a fixed size arrray - the size can't be changed and the memory is allocated by the compiler.. What exactly do you expect the call to `malloc` to do for you there? — Ken White, Oct 20 '18 at 00:47
But i need it to be a fixed size (in the future it will be with #defines instead of numbers) in order to limit how much text can be written — , Oct 20 '18 at 00:54
Buddy, `=` has lower precedence than `!=`. **You need to put parenthesis around the assignment in the `while` loop:** `( (ch = fgetc(f) ) != EOF)` — Rafael, Oct 20 '18 at 01:14
It won't affect your program since you aren't using `ch`; just a usage caution. — Rafael, Oct 20 '18 at 01:25
`char texto[15][45];` declares an array of 15 1D arrays containing `45` characters each. That means at each `texto[0]` - `texto[14]` you can store at most `45` characters (or a string of `44` characters followed by the *nul-terminating* character). In order to pass `texto[x]` as an argument to `printf ("%s", ...)` you would fall into the `44 + 1` category. — David C. Rankin, Oct 20 '18 at 06:23

score 2 · Answer 1 · answered Oct 20 '18 at 07:45

The problem you are tasked with is one of forcing you to understand the differences and limitations between character-oriented input, formatted-input, and line-oriented input. You are setting your array limits as:

char texto[15][45];

Above declares an array of 15-1D arrays containing 45 characters each which will be sequential in memory (the definition of an array). That means at each index texto[0] - texto[14] you can store at most 45 characters (or a string of 44 characters followed by the nul-terminating character).

You are then given a file of seven line of 45 characters each. But there are only 44 characters in each line? -- wrong. Since (presumably given "texto.txt") the information is held within a text file, there will be an additional '\n' (newline) character at the end of each line. You must account for its presence in reading the file. Each line in the file will look something like the following:

        10        20        30        40
123456789012345678901234567890123456789012345
lorem ipsum lorem ipsum lorem ipsum lorem ip\n

(where the numbers simply represent a scale showing how many characters are present in each line)

The ASCII '\n' character is a single-character.

The formatted-input Approach

Can you read the input with fscanf using the "%s" conversion specifier? (Answer: no) Why? The "%s" conversion specifier stops reading when it encounters the first whitespace characters after reading non-whitespace characters. That means reading with fscanf (fp, "%s", ...) will stop reading after the 5th character.

While you can remedy this by using the character-class conversion specifier of the form [...] where the brackets contains characters to be included (or excluded if the first character in the class is '^'), you leave the '\n' character unread in your input stream.

While you can remedy that by using the '*' assignment-suppression character to read and discard the next character (the newline) with "%*c", if you have any additional characters in the line, they too will remain in the input buffer (input stream, e.g. your file) unread.

Are you beginning to get the picture that doing file input with the scanf family of functions is inherently fragile? (you would be right)

A naive implementation using fscanf could be:

#include <stdio.h>

#define NROWS 15    /* if you need a constant, #define one (or more) */
#define NCOLS 45

int main (int argc, char **argv) {

    char texto[NROWS][NCOLS] = {""};
    size_t n = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    /* read up to NROWS lines of 44 char each with at most 1 trailing char */
    while (n < NROWS && fscanf (fp, "%44[^\n]%*c", texto[n]) == 1)
        n++;    /* increment line count */

    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)  /* output lines stored */
        printf ("texto[%2lu]: '%s'\n", i, texto[i]);

    return 0;
}

(note: if you can guarantee that your input file format is fixed and never varies, then this can be an appropriate approach. However, a single additional stray character in the file can torpedo this approach)

Example Use/Output

$ ./bin/texto2dfscanf <dat/texto.txt
texto[ 0]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 1]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 2]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 3]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 4]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 5]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 6]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'

line-oriented Input

A better approach is always a line-oriented approach. Why? It allows you to separately validate the read of a line of data from your file (or from the user) and then validate parsing the necessary information from that line.

But there is an intentional catch in the sizing of texto that complicates a simplistic line-oriented approach. While you may be tempted to simply attempting reading each line of text into texto[0-14], you would only be reading the text into texto and leaving the '\n' unread. (What? I thought line-oriented input handles this? -- It does if you provide sufficient space in the buffer you are trying to fill...)

Line-oriented input functions (fgets and POSIX getline) read and include the trailing '\n' into the buffer being filled -- provided there is sufficient space. If using fgets, fgets will read no more characters than specified into the buffer (which provides protection of your array bounds). Your task here has been designed to require reading of 46 characters with a line oriented function in order to read:

the text + '\n' + '\0'

(the text plus the newline plus the nul-terminating character)

This forces you to do line-oriented input properly. Read the information into a buffer of sufficient size to handle the largest anticipated input line (and don't skimp on buffer size). Validate your read succeeded. And then parse the information you need from the line using any manner you choose (sscanf is fine in this case). By doing it in this two-step manner, you can read the line, determine the original length of the line read (including the '\n') and validate whether it all fit in your buffer. You can then parse the 44 characters (plus room for the nul-terminating characters).

Further, if additional characters remain unread, you know that up-front and can then continually read and discard the remaining characters in preparation for your next read.

A reasonable line-oriented approach could look something like the following:

#include <stdio.h>
#include <string.h>

#define NROWS 15    /* if you need a constant, #define one (or more) */
#define NCOLS 45
#define MAXC  1024

int main (int argc, char **argv) {

    char texto[NROWS][NCOLS] = {""},
        buffer[MAXC] = "";
    size_t n = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while (n < NROWS && fgets (buffer, MAXC, fp)) {
        size_t len = strlen (buffer);
        if (len && buffer[len-1] == '\n')
            buffer[--len] = 0;
        else
            if (len == MAXC-1) {
                fprintf (stderr, "error: line %zu too long.\n", ++n);
                /* remove remaining chars in line before next read */
                while (fgets (buffer, MAXC, fp)) {}
            }
        if (sscanf (buffer, "%44[^\n]", texto[n]) == 1)
            n++;
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)  /* output lines stored */
        printf ("texto[%2zu]: '%s'\n", i, texto[i]);

    return 0;
}

(the output is the same)

character-oriented Input

The only method left is a character-oriented approach (which can be a very effective way of reading the file character-by-character). The only challenge with a character-oriented approach is tracking the indexes on a character-by-character basis. The approach here is simple. Just repeatedly call fgetc filling the available characters in texto and then discarding any additional characters in the line until the '\n' or EOF is reached. It can actually provide a simpler, but equally robust solution compared to a line-oriented approach in the right circumstance. I'll leave investigating this approach to you.

The key in any input task in C is matching the right set of tools with the job. If you are guaranteed that the input file has a fixed format that never deviates, then formatted-input can be effective. For all other input, (including user input), line-oriented input is generally recommended because of it's ability to read a full line without leaving a '\n' dangling in the input buffer unread -- provided you use an adequately sized buffer. Character-oriented input can always be used, but you have the added challenge of keeping track of indexing on a character-by-character basis. Using all three is the only way to develop an understanding of which is the best tool for the job.

Look things over and let me know if you have further questions.

Nice teaching; also because you put the problem in context. Slow day at the office, eh ? ;-) — Peter - Reinstate Monica, Oct 20 '18 at 07:54
One remark to "If you are guaranteed that the input file has a fixed format that never deviates, then formatted-input can be effective": That's only partly true because the higher-level `scanf` conversions (`%d`, `%s` etc.) ignore the input layout (table spaces, line and page breaks) by discarding whitespace. So the question is whether this layout needs to be recognized because it has meaning, e.g. if the input is line-oriented like this one. If, by contrast, you tokenize a C program or read n numbers, formatted input is the way to go just because it *ignores* the (irrelevant) layout. — Peter - Reinstate Monica, Oct 20 '18 at 08:03
That is quite a good observation. You also get the benefit of both `%d` (all numeric conversion) and `%s` consuming leading-whitespace. Here, though confusingly worded in the question, it was clear that the `texto[15][45]` was intended to contain all the characters in each line of text. I agree with your comment completely, but didn't see tokenization applying to the problem at hand. (it could, but that would just make incrementally adding each string to each row a bit more tedious). I know what you mean, but whether the layout has meaning, if it does, one stray char is still all it takes `:)` — David C. Rankin, Oct 20 '18 at 08:23
This is a very thorough answer, probably line oriented input was the most reasonable, given the code shown. It's difficult to decipher what was meant by "so I can limit how much it can be stored and to know when to do a new line". I read that to mean keep reading, then scan the buffer for newlines. When encountered newline, do something with it up to that point, but I mentioned `fgets` anyway. — awiebe, Oct 20 '18 at 09:59
Wow that's such a better explanation from the one my teachers ever gave me, thanks for that! I managed to get it working but I had to change it to a simple array `texto[15]` But imagine if my text wasn't formatted like it shows, but it would be a single line, how could I make so it would make a new line at the 45 character? — , Oct 20 '18 at 14:35
You could still use the exact same line-oriented approach. Recall, `fgets` will only read the number of characters specified in the 2nd parameter. So if you only had one long line of text and you wanted to break after ever 44th character, you could use `fgets (buffer, 45, infile)` (the +1 needed for the `'\0'`) and then output with `puts (buffer)` (which automatically appends a `'\n'` to the end of the output). You could also use *formatted*, or *character* oriented input, it is just a matter of accounting for "How much do I read?" and "What do I do with the output?" — David C. Rankin, Oct 21 '18 at 03:40

BladeMight · Answer 2 · 2018-10-20T01:03:56.630

You are assigning using malloc on fixed array, that is impossible, since it already has fixed size. You should define the texto as char* in order to use malloc. The purpose of malloc is to allocate memory, memory allocation on fixed arrays - not possible.

Here is example of how to read the text file in 2D array:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
    char texto[256][256]; // 256 - Big enough array, or use malloc for dynamic array size
    char ch;
    int count = 0;
    FILE *f = fopen("texto.txt", "r");

    if(f == NULL)
        printf("ERRO ao abrir o ficheiro para leitura");

    while((ch = fgetc(f) != EOF)) {
        count++;
        // rewind(f);
        int tamanho = count;
        // texto[count] = malloc(tamanho *sizeof(char));
        fscanf(f, "%s", &texto[count]);
    }
    // Now lets print all in reverse way.
    for (int i = count; i != 0; i--) {
        printf("%s, ", texto[i]);
    }
    return (0);
}

output:

ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, orem,

Read a text file into a 2D array in C

2 Answers2