2

I am new to C and I came across an issue when using fscanf to read all strings from a .txt file.

The code is as follow:

#include <stdlib.h>
#include <stdio.h>

int main() {
    FILE *spIn;
    char *numIn;

    spIn = fopen("data.txt", "r");

    if (spIn == NULL) {
        printf("Can't Open This File \n");
    }

    while ((fscanf(spIn, "%s", numIn)) == 1) {
        printf("%s\n", numIn);
    };

    fclose(spIn);
    return 1;
}

This throws an error: Segmentation fault: 11.

The original data on txt file is:

1 2 345 rrtts46
dfddcd gh 21
789 kl

a mix of ints, strings, white space and newline characters.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Tyler Parker
  • 81
  • 1
  • 4

2 Answers2

4

At least 4 candidate undefined behaviors (UB) that could lead to a fault of some kind.

  1. Code fails to pass to fscanf(spIn,"%s",numIn) an initialized pointer.
  2. Code calls fscanf() even if fopen() fails.
  3. Code calls fclose() even if fopen() fails.
  4. No width limit in fscanf(spIn,"%s",numIn)), worse than gets().

Text files really do not have strings ('\0' terminated data) nor int, they have lines (various characters with a '\n' termination).

To read a line in and save as a string, use fgets(). Do not use fscanf() to read lines of data.

#include <stdlib.h>
#include <stdio.h>

int main() {
  FILE *spIn = fopen("data.txt", "r");
  if (spIn == NULL) {
    printf("Can't Open This File \n");
  } else {
    char buf[100];
    while (fgets(buf, sizeof buf, spIn)) {
      printf("%s", buf);
    }
    fclose(spIn);
  }
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Why is `%s` "worse" than `gets`? I understand why it is "as bad as" `gets`, but I don't understand why it is supposed to be "worse" than `gets`. – Andreas Wenzel Mar 07 '21 at 15:10
  • 1
    @AndreasWenzel `gets()` at least reads the entire _line_, `'\n'` included. `fscanf(spIn,"%s",numIn)` only reads `"dfddcd"` and not `"dfddcd gh 21"` nor `"dfddcd gh 21\n"`. OP (and many others) conflate `"%s"` (reading a _string_, it does not) with reading a _line_. At least `gets()` is dropped in C11 so easier to find bad `gets()` code than bad `fscanf(spIn,"%s",numIn)`. – chux - Reinstate Monica Mar 07 '21 at 15:15
  • There was an attempt in C11 to sanitize `scanf` and other string functions in Annex K: too bad the error handling scheme was such a contraption. One can use `grep -n '%s' *.c | grep 'scanf'` to try and locate *bad* calls. – chqrlie Mar 07 '21 at 16:41
  • @chqrlie 2 alternatives I hope C does, 1) with `*scanf()` or variant, provide passing a `size_t` parameter with `%s, %[` 2) Provide an efficient `size_t readline(size_t, char *buf, FILE *)` to read an entire line, save what it can and return length saved. – chux - Reinstate Monica Mar 07 '21 at 17:08
  • @chux-ReinstateMonica: `readline` should also discard the newline to act as a replacement for `gets()`. I would use `size_t readline(char *, size_t, FILE *)` for consistency with other standard functions taking a buffer and its length. Similarly `size_t sstrcpy(char *dest, size_t n, const char *src)` would act like `snprintf(dest, n, "%s", src)` and return the length of `src`... – chqrlie Mar 07 '21 at 17:42
  • Perhaps we could design a good functional interface someday? Consider `size_t` first per [a new principle](https://en.wikipedia.org/wiki/C2x#Features). Note `sprintf()` returns `int`. – chux - Reinstate Monica Mar 07 '21 at 20:57
3

char* numIn is a pointer, and it is uninitalized, you can't really store anything in it, you need to either allocate memory for it or make it point to some valid memory location:

#include<stdlib.h> // for malloc

char* numIn = malloc(100); // space for 99 char + null terminator byte    
//...
while ((fscanf(spIn, "%99s", numIn)) == 1)
{
    printf("%s\n",numIn);
};

Or:

char str[100];
char *numIn = str;

Which in this small code makes little sense, you should probably make numIn a fixed size array to begin with:

char numIn[100];

Note that that you should use a width specifier in *scanf to avoid buffer overflow. This still has a problem though, it will read word by word, instead of line by line.

Looking at your input file, using fgets seems like a better option, it can read complete lines, including spaces:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main()
{
    FILE *spIn;
    char numIn[100];
    spIn = fopen("data.txt", "r");

    if (spIn != NULL)
    {
        while ((fgets(numIn, sizeof numIn, spIn)))
        {
            numIn[strcspn(numIn, "\n")] = '\0'; // removing \n
            printf("%s\n", numIn);
        }
        fclose(spIn);
    }
    else
    {
        perror("Can't Open This File");
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

Since fgets also parses the \n character, I'm removing it with strcspn.

Though you do verify the return value of fopen the execution continues even if it fails to open, I also addressed that issue.

anastaciu
  • 23,467
  • 7
  • 28
  • 53