0

I was trying to get input from a file in C using scanf. The data in the file is as follows:

223234       <Justin>      <Riverside>  

This is the following regex I tried:

FILE* fid;


    int id;
    char name[100], city[100];
    char dontcare1[40], dontcare3[40];
    char dontcare2,dontcare4[40],dontcare5;


    fid = fopen("test.txt", "r");

    fscanf(fid,"%d%[^<]%c%[^<]%c%[>]%c ",&id,&dontcare1[0],
                                            &dontcare2,&dontcare3[0],&dontcare4[0],
                                            &city[0],&dontcare5);

I was wondering if there is a better way to do this, how would I account for whitespaces in the file without creating extra variables, this doesn't seem to pick up the city name enclosed in the brackets.

Nimantha
  • 6,405
  • 6
  • 28
  • 69

4 Answers4

1

In *scanf() you can expect literal characters and one space can match many separators.

My example is simplified with sscanf() in order to avoid dealing with a file, but it works the same with fscanf().

The trick here is to use %n in order to obtain the number of read characters till that point; this way, we ensure the last > literal has actually been read (we cannot know that with the result of *scanf())

/**
  gcc -std=c99 -o prog_c prog_c.c \
      -pedantic -Wall -Wextra -Wconversion \
      -Wc++-compat -Wwrite-strings -Wold-style-definition -Wvla \
      -g -O0 -UNDEBUG -fsanitize=address,undefined
**/

#include <stdio.h>

int
main(void)
{
  const char *line="223234       <Justin>      <Riverside>";
  int id;
  char name[100], city[100];

  int n_read=-1;
  sscanf(line, "%d <%[^>]> <%[^>]>%n",
         &id, name, city, &n_read);
  if(n_read!=-1) // read till the end then updated
  {
    printf("id=%d\n", id);
    printf("name=%s\n", name);
    printf("city=%s\n", city);
  }
  return 0;
}
prog-fh
  • 13,492
  • 1
  • 15
  • 30
1

When trying to open the file, it's useful to ensure that the file was actually opened successfully.

FILE *fid;
fid = fopen("path/to/file", "r");

if (fid == NULL){
    printf("Unable to open file. \n");
    return -1;
}

Actually addressing your problem, I'd probably just use string.h's strtok function, then use a space as a delimiter.

Also, I wouldn't use scanf, but rather fgets... The reasons for this can be found in various other SO articles. The following is an untested solution.

        char line[100], line_parse[100]; // Buffer(s) to store lines upto 100
        char *ret; // token used for strtok

        // Read an integer and store read status in success.
        while (fgets(line, sizeof(line), fPtrIn) != NULL)
        {
            // Copy the line for parsing, as strtok changes original string
            strcpy(line_parse, line);

            // Separate the line into tokens
            ret = strtok(line_parse, " ");
            
            while (ret != NULL)
            {/*do something with current field*/
                ret = strtok(NULL, " "); // Move onto next field
            }

Please be aware that strtok is not thread-safe. In multi-threaded code, you should therefore not use this function. Unfortunately, the ISO C standard itself does not provide a thread-safe version of the function. But many platforms provide such a function as an extension: On POSIX-compliant platforms (such as Linux), you can use the function strtok_r. On Microsoft Windows, you can use the function strtok_s. Both of these functions are thread-safe.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
Lewis Farnworth
  • 265
  • 2
  • 8
  • You might want to [remove the newline character that was read in by `fgets`](https://stackoverflow.com/questions/2693776/removing-trailing-newline-character-from-fgets-input) or add that character to the list of delimiters when calling the function `strtok`. Otherwise, the last field will contain the newline character. – Andreas Wenzel Feb 03 '21 at 00:27
1

You can actually do this quite simply by reading the line into an array (buffer) and then parsing what you need from the line with sscanf(). Don't use scanf() directly as that opens you up to a whole array of pitfalls related to what characters remain unread in your input stream. Instead, do all input by reading a line at a time and then use sscanf() to parse the values from the buffer, just as you would with scanf(), but by using fgets() to read, you consume an entire line at a time, and what remains in your input stream does not depend on the success or failure of your conversions.

For example, you could do:

#include <stdio.h>

#define MAXC 1024
#define NCMAX 100

int main (void) {
    
    char buf[MAXC],
        name[NCMAX],
        city[NCMAX];
    unsigned n;
    
    if (!fgets (buf, MAXC, stdin))
        return 1;
    
    if (sscanf (buf, "%u <%99[^>]> <%99[^>]>", &n, name, city) != 3) {
        fputs ("error: invalid format", stderr);
        return 1;
    }
    
    printf ("no.  : %u\nname : %s\ncity : %s\n", n, name, city);
}

The sscanf() format string is key. "%u <%99[^>]> <%99[^>]>" reads the number as an unsigned value, <%99[^>]> consumes the '<' and then the character class %99[^>] uses the field-width modifier of 99 to protect your array bounds and the class [^>] will read all characters not including > (it does the same for the city next). The conversion is Validated by Checking the Return to insure three valid conversions took place. If not, the error is handled.

Example Use/Output

With your input in the file dat/no_name_place.txt, the file is simply redirected on stdin and read by the program resulting in:

$ ./bin/no_name_city < dat/no_name_place.txt
no.  : 223234
name : Justin
city : Riverside
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • The result of `sscanf()` gives `3` even if no `>` was read after the city. However, I don't know if it is important for this particular problem. – prog-fh Feb 03 '21 at 06:49
  • Yes, you are correct, because the last `'>'` is not part of the final conversion, so it cannot cause a *matching* or *input* failure. And no, I doubt it is important, it could likely be omitted from the *format-string* altogether. What is important is reading the city name including all character not a `'>'`. – David C. Rankin Feb 03 '21 at 07:02
0

If you have to use scanf(), the other answers seem to cover every aspect. This is an alternative, getting input character by character using fgetc() and strcpy().

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX_SIZE 100

int main(void)
{
    int id = 0, c = 0;
    char buff1[MAX_SIZE], buff2[MAX_SIZE];
    size_t i = 0U;
    FILE *fptr = NULL;

     if (!(fptr = fopen("test.txt", "r")))
     {
        perror("error opening file");
        return -1;
     }

    while ((c = fgetc(fptr)) != EOF)
    {
        if (isdigit(c)) /* maybe check INT_MAX here if you are planning to scan big numbers */
        {
            id = (id * 10) + (c - '0');
        }

        if (i != 0 && c == ' ')
        {
            buff2[i] = '\0';
            strcpy(buff1, buff2);
            i = 0U;
        }

        if (isalpha(c))
        {
            if (i < MAX_SIZE - 1)
            {
                buff2[i++] = c;
            }
            else
            {
                fputs("Buff full", stderr);
                return -1;
            }
        }
    }
    buff2[i] = '\0';

    return 0;
}
alex01011
  • 1,670
  • 2
  • 5
  • 17
  • Note that `perror` may not work in this case, as it is not guaranteed that `fopen` will set `errno`, even if it fails. This is guaranteed by POSIX, but not ISO C. Therefore, you may want to set `errno` to `0` before the `fopen` call and only call `perror` if `errno != 0`. If `fopen` fails and `errno == 0`, then you might want to provide a generic error message instead. – Andreas Wenzel Feb 03 '21 at 00:51