It is better to supply the file name as command-line argument to your program, because it makes it easier to test and use.
In the file, each line seems to be a separate record. So, it would be better to read each line, then parse the fields from the line.
Consider the following:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#define MAX_LINE_LEN 500
int main(int argc, char *argv[])
{
char line[MAX_LINE_LEN + 1]; /* +1 for the end-of-string '\0' */
FILE *in;
if (argc != 2) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
in = fopen(argv[1], "r");
if (!in) {
fprintf(stderr, "Cannot open %s: %s.\n", argv[1], strerror(errno));
return EXIT_FAILURE;
}
while (fgets(line, sizeof line, in) != NULL) {
char id[20], code[20], address[50], dummy;
if (sscanf(line, " %19s %19s %49s %c", id, code, address, &dummy) == 3) {
/* The line did consist of three fields, and they are
now correctly parsed to 'id', 'code', and 'address'. */
printf("id = '%s'\ncode = '%s'\naddress = '%s'\n\n",
id, code, address);
} else {
/* We do have a line, but it does not consist of
exactly three fields. */
/* Remove the newline character(s) at the end of line. */
line[strcspn(line, "\r\n")] = '\0';
fprintf(stderr, "Cannot parse line '%s'.\n", line);
}
}
if (ferror(in)) {
fprintf(stderr, "Error reading %s.\n", argv[1]);
return EXIT_FAILURE;
} else
if (fclose(in)) {
fprintf(stderr, "Error closing %s.\n", argv[1]);
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Above, argc
contains the number of command-line arguments, with the program name used as the first (zeroth, argv[0]
) argument. We require two: the program name and the name of the file to be read. Otherwise, we print out an usage message.
We try to open the file for reading. If fopen()
fails, it returns NULL
, with the error stored in errno
. strerror(errno)
yields the human-readable error message.
fgets(array, sizeof array, stream)
reads a line (unless too long to fit in array
) from stream
. If it succeeds, it returns a pointer to the first element in array
. If it fails -- there is no more to read, for example --, it returns NULL
.
Remember that feof(stream)
does not check if stream
has more data to read. It only reports whether the end of stream
has already been encountered. So, instead of reading until feof()
returns true, you should simply read data until reading fails, then check why the reading failed. This is what the above example program does.
We want to treat each line as a separate record. Because fscanf()
does not distinguish '\n'
from spaces (in neither the conversion specification, nor when implicitly skipping whitespace), using fscanf(in, " %19s %19s %49s", ...)
does not restrict the parsing to a single line: they may be on the same line, or on different lines, or even have empty lines in between. To restrict our parsing to a single line, we first read each line with fgets()
, then try and parse that line, and that line only, using sscanf()
. (sscanf()
works just like fscanf()
, but takes its input from a string rather than a stream.)
To avoid buffer overflow, we must tell sscanf()
how long our buffers can be, remembering to reserve one char for the end-of-string mark (NUL, '\0'
). Because id
is 20 chars long, we can use up to 19 for the ID string, and therefore we need to use %19s
to do the conversion correctly.
The return value from sscanf()
is the number of successful conversions. By adding a dummy character (%c
) conversion at the end that we expect to fail in normal circumstances, we can detect if the line contained more than we expected. This is why the sscanf()
pattern has four conversions, but we require exactly the first three of them to succeed, and the fourth, dummy one, to fail, if the input line has the format we expected.
Note that we could try several different sscanf()
expressions, if we accept the input in different formats. I like to call this speculative parsing. You simply need to order them so that you try the most complex ones first, and accept the first one that yields the expected number of successful conversions. For a practical example of that, check out the example C code I used in another answer to allow the user to specify simulation details using name=value pairs on the command line.
The line[strcspn(line, "\r\n")] = '\0';
expression is a trick, really. strcspn()
is a standard C <string.h>
function, which returns the number of characters in the first string parameter, until end of string or any of the characters in the second string are encountered, whichever happens first. Thus, strcspn(line, "\r\n")
yields the number of characters in line
until end of string, '\r'
, or '\n'
is encountered, whichever happens first. We trim off the rest of the string by using that as the index to the line buffer, and making the string end there. (Remember, NUL or '\0'
always ends the string in C.)
After the while
loop, we check why the fgets()
returned NULL
. If ferror()
returns true, then there was a real read error. These are very, very rare nowadays, but not checking them is just like walking around with a weapon without the safety engaged: it is an unnecessary risk with zero reward.
In most operating systems, fclose()
cannot even fail if you opened the file read-only, but there are some particular cases on some where it might. (Also, it can fail when you write to streams, because the C library may cache data -- keep it in an internal buffer, rather than write it immediately, for efficiency sake -- and write it out only when you close the stream. Like any write, that can fail due to a real write error; say, if the storage media is already full.)
Yet, it only costs a couple of lines of C code to check both ferror()
and fclose()
, and let the user know. I personally hate, with a deep-burning passion, programs that do not do that, because they really risk losing user data silently, without warning. The users may think everything is okay, but the next time they try to access their files, some of it is missing... and they usually end up blaming the operating system, not the actual culprits, the bad, evil programs that failed to warn the user about an error they could have detected.
(It is best to learn to do that as early as possible. Like security, error checking is not something you can really bolt on later: you either design it in, or it won't be reliable.)
Also note that the Linux man pages project contains a very well maintained list of C library functions (along with POSIX.1, GNU, and Linux-specific functions). Do not be fooled by its name. Each of the pages contains a Conforming to section, which tells you which standards the function or functions described on that page conforms to. If it is C89, then it works in just about all operating systems you can imagine. If it is C99 or any POSIX.1 version, it may not work in Windows or DOS (or using the ancient Borland C compiler), but it will work in most other operating systems.
Because the OP is obviously reading non-ASCII files, I would recommend trying out the localized version of the program, that uses wide characters and wide strings:
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <wchar.h>
#include <stdio.h>
#include <errno.h>
#define MAX_WLINE_LEN 500
int main(int argc, char *argv[])
{
wchar_t line[MAX_WLINE_LEN + 1]; /* +1 for the end-of-string L'\0' */
FILE *in;
if (argc != 2) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
if (setlocale(LC_ALL, "") == NULL)
fprintf(stderr, "Warning: Your C library does not support your currently set locale.\n");
if (fwide(stdout, 1) < 1)
fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");
in = fopen(argv[1], "r");
if (!in) {
fprintf(stderr, "Cannot open %s: %s.\n", argv[1], strerror(errno));
return EXIT_FAILURE;
}
if (fwide(in, 1) < 1)
fprintf(stderr, "Warning: Your C library does not support wide input from %s.\n", argv[1]);
while (fgetws(line, sizeof line / sizeof line[0], in) != NULL) {
wchar_t id[20], code[20], address[50], dummy;
if (swscanf(line, L" %19ls %19ls %49ls %lc", id, code, address, &dummy) == 3) {
/* The line did consist of three fields, and they are
now correctly parsed to 'id', 'code', and 'address'. */
wprintf(L"id = '%ls', code = '%ls', address = '%ls'\n",
id, code, address);
} else {
/* We do have a line, but it does not consist of
exactly three fields. */
/* Remove the newline character(s) at the end of line. */
line[wcscspn(line, L"\r\n")] = L'\0';
fprintf(stderr, "Cannot parse line '%ls'.\n", line);
}
}
if (ferror(in)) {
fprintf(stderr, "Error reading %s.\n", argv[1]);
return EXIT_FAILURE;
} else
if (fclose(in)) {
fprintf(stderr, "Error closing %s.\n", argv[1]);
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
The above code is pure C99 code, and should work on all OSes who have a standard C library conforming to C99 or later. (Unfortunately, Microsoft is not willing to implement some C99 features, even though it "contributed" to C11, which means the above code may need to have additional Windows-specific code to work on Windows. It does work fine in Linux, BSDs, and Macs, however.)