-2

I want to read a text file in my documents folder. I don't understand why I can't read it or if it's read, why I can't write it on my console application. When I run the program it's just stuck. It doesnt do anything.

The file I want to read:

330400199711111890 W1         ZhejiangJianxin                                   
330411193807234897 W2         ZhejiangJianxinÐãÖÞÇø                             
331122199502289716 W3         ZhejiangçÆÔÆÏØ                                    
330402192503284421 M1         ZhejiangJianxinÄϺþÇø                             
330225198403042936 W4         ZhejiangÏóÉ½ÏØ                                    
330681194109099151 W5         ZhejiangÖîôßÊÐ                                    
330727195612078712 W6         ZhejiangÅͰ²ÏØ                                    
330921193708179044 M2         Zhejiangá·É½ÏØ                                    
330303195103046912 W7         ZhejiangWenzhouÁúÍåÇø                             
330781197108138752 W8         ZhejiangÀ¼ÏªÊÐ                                    
330127193411280584 M3         Zhejiang´¾°²ÏØ                                    
331001193310027792 W9         ZhejiangTaizhouDowntown                           
331125196503132898 W10        ZhejiangÔÆºÍÏØ                                    
331000192003056719 W11        ZhejiangTaizhou                                   
330106194503103959 W12        ZhejiangHangzhouWestlakeDistrict                  
330106194610285524 M4         ZhejiangHangzhouWestlakeDistrict                  
330301198301227758 W13        ZhejiangWenzhouDowntown 

The program:

#include <stdio.h>
#include <stdlib.h>
#define NAME_LEN 80

// struct with student information. id, name and address
typedef struct
{
    char id[19];
    char code[20];
    char address[50];
} Student;


int main()
{
    Student newStudent;

    FILE *fp;
    fp=fopen("ID500.txt","r");

    char id[20],code[20],address[50];

    while(!feof(fp))
    {
        fscanf(fp,"%s %s %s",id,code,address);
        printf("%s %s %s\n",id,code,address);
    }
    return 0;
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Dedicated User
  • 97
  • 1
  • 1
  • 11

2 Answers2

3

It is better to supply the file name as command-line argument to your program, because it makes it easier to test and use.

In the file, each line seems to be a separate record. So, it would be better to read each line, then parse the fields from the line.

Consider the following:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

#define  MAX_LINE_LEN  500

int main(int argc, char *argv[])
{
    char  line[MAX_LINE_LEN + 1]; /* +1 for the end-of-string '\0' */
    FILE *in;

    if (argc != 2) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    in = fopen(argv[1], "r");
    if (!in) {
        fprintf(stderr, "Cannot open %s: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }

    while (fgets(line, sizeof line, in) != NULL) {
        char  id[20], code[20], address[50], dummy;

        if (sscanf(line, " %19s %19s %49s %c", id, code, address, &dummy) == 3) {
            /* The line did consist of three fields, and they are
               now correctly parsed to 'id', 'code', and 'address'. */

            printf("id = '%s'\ncode = '%s'\naddress = '%s'\n\n",
                   id, code, address);

        } else {

            /* We do have a line, but it does not consist of
               exactly three fields. */

            /* Remove the newline character(s) at the end of line. */
            line[strcspn(line, "\r\n")] = '\0';

            fprintf(stderr, "Cannot parse line '%s'.\n", line);

        }
    }

    if (ferror(in)) {
        fprintf(stderr, "Error reading %s.\n", argv[1]);
        return EXIT_FAILURE;
    } else
    if (fclose(in)) {
        fprintf(stderr, "Error closing %s.\n", argv[1]);
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

Above, argc contains the number of command-line arguments, with the program name used as the first (zeroth, argv[0]) argument. We require two: the program name and the name of the file to be read. Otherwise, we print out an usage message.

We try to open the file for reading. If fopen() fails, it returns NULL, with the error stored in errno. strerror(errno) yields the human-readable error message.

fgets(array, sizeof array, stream) reads a line (unless too long to fit in array) from stream. If it succeeds, it returns a pointer to the first element in array. If it fails -- there is no more to read, for example --, it returns NULL.

Remember that feof(stream) does not check if stream has more data to read. It only reports whether the end of stream has already been encountered. So, instead of reading until feof() returns true, you should simply read data until reading fails, then check why the reading failed. This is what the above example program does.

We want to treat each line as a separate record. Because fscanf() does not distinguish '\n' from spaces (in neither the conversion specification, nor when implicitly skipping whitespace), using fscanf(in, " %19s %19s %49s", ...) does not restrict the parsing to a single line: they may be on the same line, or on different lines, or even have empty lines in between. To restrict our parsing to a single line, we first read each line with fgets(), then try and parse that line, and that line only, using sscanf(). (sscanf() works just like fscanf(), but takes its input from a string rather than a stream.)

To avoid buffer overflow, we must tell sscanf() how long our buffers can be, remembering to reserve one char for the end-of-string mark (NUL, '\0'). Because id is 20 chars long, we can use up to 19 for the ID string, and therefore we need to use %19s to do the conversion correctly.

The return value from sscanf() is the number of successful conversions. By adding a dummy character (%c) conversion at the end that we expect to fail in normal circumstances, we can detect if the line contained more than we expected. This is why the sscanf() pattern has four conversions, but we require exactly the first three of them to succeed, and the fourth, dummy one, to fail, if the input line has the format we expected.

Note that we could try several different sscanf() expressions, if we accept the input in different formats. I like to call this speculative parsing. You simply need to order them so that you try the most complex ones first, and accept the first one that yields the expected number of successful conversions. For a practical example of that, check out the example C code I used in another answer to allow the user to specify simulation details using name=value pairs on the command line.

The line[strcspn(line, "\r\n")] = '\0'; expression is a trick, really. strcspn() is a standard C <string.h> function, which returns the number of characters in the first string parameter, until end of string or any of the characters in the second string are encountered, whichever happens first. Thus, strcspn(line, "\r\n") yields the number of characters in line until end of string, '\r', or '\n' is encountered, whichever happens first. We trim off the rest of the string by using that as the index to the line buffer, and making the string end there. (Remember, NUL or '\0' always ends the string in C.)

After the while loop, we check why the fgets() returned NULL. If ferror() returns true, then there was a real read error. These are very, very rare nowadays, but not checking them is just like walking around with a weapon without the safety engaged: it is an unnecessary risk with zero reward.

In most operating systems, fclose() cannot even fail if you opened the file read-only, but there are some particular cases on some where it might. (Also, it can fail when you write to streams, because the C library may cache data -- keep it in an internal buffer, rather than write it immediately, for efficiency sake -- and write it out only when you close the stream. Like any write, that can fail due to a real write error; say, if the storage media is already full.)

Yet, it only costs a couple of lines of C code to check both ferror() and fclose(), and let the user know. I personally hate, with a deep-burning passion, programs that do not do that, because they really risk losing user data silently, without warning. The users may think everything is okay, but the next time they try to access their files, some of it is missing... and they usually end up blaming the operating system, not the actual culprits, the bad, evil programs that failed to warn the user about an error they could have detected.

(It is best to learn to do that as early as possible. Like security, error checking is not something you can really bolt on later: you either design it in, or it won't be reliable.)

Also note that the Linux man pages project contains a very well maintained list of C library functions (along with POSIX.1, GNU, and Linux-specific functions). Do not be fooled by its name. Each of the pages contains a Conforming to section, which tells you which standards the function or functions described on that page conforms to. If it is C89, then it works in just about all operating systems you can imagine. If it is C99 or any POSIX.1 version, it may not work in Windows or DOS (or using the ancient Borland C compiler), but it will work in most other operating systems.


Because the OP is obviously reading non-ASCII files, I would recommend trying out the localized version of the program, that uses wide characters and wide strings:

#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <wchar.h>
#include <stdio.h>
#include <errno.h>

#define  MAX_WLINE_LEN  500

int main(int argc, char *argv[])
{
    wchar_t  line[MAX_WLINE_LEN + 1]; /* +1 for the end-of-string L'\0' */
    FILE *in;

    if (argc != 2) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    if (setlocale(LC_ALL, "") == NULL)
        fprintf(stderr, "Warning: Your C library does not support your currently set locale.\n");

    if (fwide(stdout, 1) < 1)
        fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");

    in = fopen(argv[1], "r");
    if (!in) {
        fprintf(stderr, "Cannot open %s: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }
    if (fwide(in, 1) < 1)
        fprintf(stderr, "Warning: Your C library does not support wide input from %s.\n", argv[1]);

    while (fgetws(line, sizeof line / sizeof line[0], in) != NULL) {
        wchar_t  id[20], code[20], address[50], dummy;

        if (swscanf(line, L" %19ls %19ls %49ls %lc", id, code, address, &dummy) == 3) {
            /* The line did consist of three fields, and they are
               now correctly parsed to 'id', 'code', and 'address'. */

            wprintf(L"id = '%ls', code = '%ls', address = '%ls'\n",
                   id, code, address);

        } else {

            /* We do have a line, but it does not consist of
               exactly three fields. */

            /* Remove the newline character(s) at the end of line. */
            line[wcscspn(line, L"\r\n")] = L'\0';

            fprintf(stderr, "Cannot parse line '%ls'.\n", line);

        }
    }

    if (ferror(in)) {
        fprintf(stderr, "Error reading %s.\n", argv[1]);
        return EXIT_FAILURE;
    } else
    if (fclose(in)) {
        fprintf(stderr, "Error closing %s.\n", argv[1]);
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

The above code is pure C99 code, and should work on all OSes who have a standard C library conforming to C99 or later. (Unfortunately, Microsoft is not willing to implement some C99 features, even though it "contributed" to C11, which means the above code may need to have additional Windows-specific code to work on Windows. It does work fine in Linux, BSDs, and Macs, however.)

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • Nice and thorough answer. Aside from "Because fscanf() treats newlines as spaces". It is not the `fscanf()` treats newlines as white-spaces it is that many formats, not all, do so. – chux - Reinstate Monica Jun 28 '18 at 15:11
  • @chux: Even `" "` treats spaces and newlines equally, as does fixed patterns that ignore spaces like `"abc"` (which matches `"a\nb c"`, for example). How to word that better? Any suggestions? :) – Nominal Animal Jun 28 '18 at 15:25
  • Maybe "Because fscanf() treats newlines as spaces, we use sscanf() instead, to try and parse just the line we already read." --> "Because `fscanf(..., "%s %s %s", ...` does not distinguish `'\n'` from spaces, we use `fgets(), sscanf()` to read the _line_ and then parse."? – chux - Reinstate Monica Jun 28 '18 at 15:32
  • @chux: Thanks! I started putting that in, then inspiration struck, and I split the paragraph into two; and snuck in one mentioning the *"speculative parsing"* with `sscanf()` I've found quite useful in similar situations. I'm much happier with this wording, but if you (or anyone else) think of a better wording, or notice an error I've made, do let me know: it is always appreciated! – Nominal Animal Jun 28 '18 at 15:59
  • Note: the `dummy` idea works well here yet would be insufficient with a format with required trailing text like `" %19s %19s %49s the_end %c" ... == 3`. An alternative is to use `int n=0; ... sscanf(s, " %19s %19s %49s the_end %n", ... &n); Success = n && s[n]==0;`. – chux - Reinstate Monica Jun 28 '18 at 18:47
  • @chux: Very true. Showing it using a pattern that is self-contained in a single if statement, without risking confusion, is not easy, though. Something like `if ((n = 0, sscanf(s, " ... %n", ..., &n), n > 0) && s[n] == '\0')` is the best I can come up with; it also works as an example of the C comma operator (as described in e.g. C11 6.5.17). – Nominal Animal Jun 28 '18 at 19:53
0

This mistake is with your while(!feof(fp)).

Modify your while-loop as given below

 while(fscanf(fp,"%s %s %s",id,code,address)==3) // check whether 3 items are read. 
    {
        printf("%s %s %s\n",id,code,address);
    }

This will work.Also you can use the EOF also.

while(fscanf(fp,"%s %s %s",id,code,address)!=EOF)
    {
        printf("%s %s %s\n",id,code,address);
    }

The logic is that in fscanf() function , If a reading error happens or the end-of-file is reached while reading, 'EOF' is returned (Value less than or even zero).

anoopknr
  • 3,177
  • 2
  • 23
  • 33
  • 2
    It would be best to test `== 3` rather than `> 0`. That would ensure that three fields were read. There's also room to argue that `"%19s %19s %49s"` would be a better format for the three variables the data is read into. There's also room to argue that the code using just `fscanf()` cannot insist that the three fields are on a single line and that all the data on the line is read — the code would need to use `fgets()` (or POSIX `getline()`) and `sscanf()` to ensure that. – Jonathan Leffler Jun 28 '18 at 14:16
  • @[Jonathan Leffler](https://stackoverflow.com/users/15168/jonathan-leffler) Great suggestion. I have modified the code. – anoopknr Jun 28 '18 at 14:20
  • 3
    `!= EOF` is definitely worse than `> 0` — I meant `== 3`. Especially with numeric formats, you can end up with 0, 1 or 2 valid values, which still means there is a problem with the input, even though EOF has not been encountered. With strings, only the last input can fail if there are only 1 or 2 words left in the input (you won't get 0). You probably shouldn't use the input if you do only get 1 or 2 words. – Jonathan Leffler Jun 28 '18 at 14:24