0

I've got a text file that I want to read. The file has the following content:

Asdsf adsfsd
54
asdfa adwfasd
12
asdf adf 
545
asdf asdfasfd
3243
adfasf asdfasdf
324324
asfda asdfasdf
3124
adfa asdfas
432
asdf ad

and my code:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>


struct Element {
    int edad;
    char name[50];
};

int main() {
    struct Element aux;
    FILE* fitxer;
    fopen_s(&fitxer, "Text.txt", "r");
    if (fitxer != NULL) {
        while (!feof(fitxer)) {
            fgets(aux.name, 50, fitxer);
            aux.name[strlen(aux.name) - 1] = '\0';
            int ret = fscanf_s(fitxer, "%i", &aux.edad);
            char endl;
            fscanf_s(fitxer, "%c", &endl);
            printf("%d %s \n", aux.edad, aux.name);
        }
        fclose(fitxer);
    }
    else {
        printf("Error: File not found.");
    }    
}

I had problems before because I didn't know that f_scanf doesn't take the endline character. Now the problem is that there are some strings in the file that are being chopped. Output:

54 Asdsf adsfsd
12 asdfa adwfasd
545 asdf adf
3243 asdf asdfasfd
324324 adfasf asdfasdf
3124 asfda asdfasdf
432 adfa asdfas
432 asdf a

For instance, in this example the last letter is being chopped. I suspect it has something to do with the conversion to string, adding the '\0' character, but I'm unable to find the error.

Also I would like to ask if there is a way to do it more elegant.

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
Norhther
  • 545
  • 3
  • 15
  • 35

3 Answers3

1

With

aux.name[strlen(aux.name) - 1] = '\0';

you get rid of a well known behavior of fgets: it stores the whole line to the output buffer included the '\n' character.

But what if that character is not present? You would chop the last character.

It is exactly what happens when you read the last line of the file. Since there's no trailing '\n' character, fgets stops as soon as the end of file is reached.

To fix it just check if the character to be substituted is the expected one.

Something like this:

size_t len = strlen(aux.name);

if(len > 0 && aux.name[len - 1] == '\n')
    aux.name[len - 1] = '\0';

The check len > 0 avoids undefined behavior in case of 0-length string (it happens if the first character of the line is '\0').

Roberto Caboni
  • 7,252
  • 10
  • 25
  • 39
  • 2
    Hacker exploit: `aux.name[strlen(aux.name) - 1]` and `aux.name[len - 1]` suffer UB if the first character _read_ is a _null character_. If you want `len`, use `if(len > 0 && aux.name[len - 1] == '\n') aux.name[--len] = '\0';` – chux - Reinstate Monica May 09 '20 at 16:29
  • @chux-ReinstateMonica I'll edit the answer with your improvement. Anyway I have to say that first sentence of the question is _"I've got a text file that I want to read"_. A file containing null character is not a text file.;) – Roberto Caboni May 09 '20 at 16:34
  • Note: UTF16 _text files_ commonly contain null bytes. Easy enough for a user to innocently read such a _text file_ mistakenly here and cause code to to _something_ bad. So rather than oblige the user to provide vetted input, robust code handles the [good, bad and ugly](https://en.wikipedia.org/wiki/The_Good,_the_Bad_and_the_Ugly). – chux - Reinstate Monica May 09 '20 at 16:39
  • @chux-ReinstateMonica touchet on UTF16. I didn't think about them. I edited it. Ps: I don't get the relationship between the robust code and the movie you linked. And it is a shame since I'm a HUGE Leone's fan. – Roberto Caboni May 09 '20 at 16:47
  • 1
    It is just that user input (the text file) might by good (contains the usual expect data nicely laid out, no \0, \r, last line ending with \n, no line too long, no characters outside ASCII, ...) or ugly (innocently messing up one of those prior requirements) or bad (purposely designed to attack code). The movie name epitomizes those 3 cases that robust code needs to handle. – chux - Reinstate Monica May 09 '20 at 16:53
1

At least 3 problems:

Wrong test for end-of-file, avoid magic numbers

ref

//while (!feof(fitxer)) {
//    fgets(aux.name, 50, fitxer);
while (fgets(aux.name, sizeof aux.name, fitxer)) {

fscanf_s(fitxer, "%c", &endl); is missing an augment.

Research fscanf_s() if interested, or better yet, just use fgets() for input.

Wrong code to lop off potential trialing '\n'

Alternatives: 1 2

// aux.name[strlen(aux.name) - 1] = '\0';
aux.name[strcspn(aux.name, "\n")] = '\0';
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Using fgets is going to give me the full line, how could I convert that into an integer and string? – Norhther May 14 '20 at 13:37
  • @Norhther "fgets is going to give me the full line" _as a string_. So "how could I convert that into (a)... string?" is done. To convert a string into an integer research `strtol()`, `sscanf(s, "%d"...`. – chux - Reinstate Monica May 14 '20 at 13:41
  • @Reinstate Monica I see. I've got a compiler warning using sscanf (I'm using VS) because of possible overflows, but I'm alreading restricting the buffer size using gets(), so I think It's safe. Thanks a lot. – Norhther May 14 '20 at 13:51
  • @Norhther Likely the warning is the famous #C4996. Ignore it or [this](https://stackoverflow.com/q/16883037/2410359). It encourages a MS specific solution to a general problem. Better yet, rather than use the VS C compiler which follows 30 year old (and 3 C revs back) C89 , use a modern compiler that adheres to C99, C11 or C18. I use gcc (within Windows). – chux - Reinstate Monica May 14 '20 at 14:15
0
aux.name[strlen(aux.name) - 1] = '\0';

This line chops of the last character from the string you read with fgets. For most lines that character is the \n at the end of the line. But I assume that your last line doesn't have a line break at the end. So you chop off the actual last character.

To fix this, you should only chop off the last character if it's equal to '\n'.

PS: Your last call to fscanf_s to fails and you end up printing the same number as the previous line. I'm not sure whether that's intentional or not.

PPS: If your last call to fscanf_s didn't fail, your while loop would loop once more than it should because feof only returns true if a previous read failed because of the end of file. So instead of using feof, you'll probably want to directly check whether your read operations failed.

sepp2k
  • 363,768
  • 54
  • 674
  • 675