-1

Data file:

Newton  30  United Kingdom  Scientist
Maxwell 25  United Kingdom  Mathematician
Edison  60  United States   Engineer

Code to read it:

#define MAX_NAME    50
#define MAX_COUNTRY 25
#define MAX_PROFILE 20
struct person
{
    char *name;
    int age;
    char *country;
    char *profile;
};

struct person pObj;
pObj->name = (char *) malloc(sizeof(MAX_NAME));
pObj->country = (char *) malloc(sizeof(MAX_COUNTRY));
pObj->profile = (char *) malloc(sizeof(MAX_PROFILE));

fscanf(fPtr,"%s\t%d\t%s\t%s\n",pObj->name,&pObj->age,pObj->country,pObj->profile);

I wrote a program to read tab delimited record to a structure using fscanf(). Same thing I can do by strtok(), strsep() functions also. But If I use strtok(), I forced to use atoi() function to load age field. But I don't want to use that atoi() function. So I simply used fscanf() to read age as Integer directly from the FILE stream buffer. It works fine. BUT for some record, country field is empty as like below.

Newton  30  United Kingdom  Scientist
Maxwell 25      Mathematician
Edison  60  United States   Engineer

When I read the second record, fscanf() doesn't fill empty string to the country field instead it has been filled with profile data. We understand fscanf() works that way. But is it there any option to scan the country field even though it is empty in the file? Can I do this without using atoi() function for age? i.e., reading fields by that respective types but not all the fields as strings.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Smith Dwayne
  • 2,675
  • 8
  • 46
  • 75
  • 1
    Don't. Use `fgets` + `strtok` or `strsep`. [Possible dupe](https://stackoverflow.com/questions/53494091/read-consecutive-tabs-as-empty-field-fscanf) – DYZ Nov 28 '18 at 05:20
  • 1
    suggest using `fgets()` to read the whole line, then using `sscanf()` to extract the fields. To determine if some field is missing, always check the returned value from `fscanf()`/`sscanf()` Note: use the returned value, not the parameter values. There are much better ways to obtain the integer value than `atoi()` for instance `strtol()` – user3629249 Nov 28 '18 at 05:34

1 Answers1

1

Original format

The %s conversion specification skips any white space (blanks, tabs, newlines, etc) in the input, and then reads non-white-space up to the next white space character. The \t appearing in the format string causes fscanf() to skip zero or more white space characters (not just tabs).

You have:

fscanf(fPtr,"%s\t%d\t%s\t%s\the n", pObj->name, pObj->age, pObj->country, pObj-profile);

You need to pass a pointer to the age and you need an arrow -> between pObj and profile (please post code that could compile; it doesn't inspire confidence when there are errors like this):

fscanf(fPtr,"%s\t%d\t%s\t%s\the n", pObj->name, &pObj->age, pObj->country, pObj->profile);

Given the first input line:

Newton  30  United Kingdom  Scientist

fscanf() will read Newton into pObj->name, 30 into pObj->age,UnitedintopObj->countryandKingdomintopObj->profile.fscanf()` and family are very casual about white space, in general. Most conversions skip leading white space.

After the 4 values are assigned, you have \the n" at the end of the format. The tab skips the white space between Kingdom and Scientist, but the data doesn't match he n, so the scanning stops — not that you're any the wiser for that.

The next operation will pick up where this one stopped, so the next pObj->name will be assigned Scientist and then the pObj->age conversion will fail because Maxwell doesn't represent an integer. The conversions stop there on that fscanf().

And so the problems continue. Your claimed output can't be attained with the code you show in the question.

If you're adamant that you must use fscanf(), you'll need to use scan sets such as %24[^\t] to read the country. But you'd do better using fgets() or POSIX function getline() to read whole lines of input, and then perhaps use sscanf() but more likely use strcspn() or strpbrk() from Standard C (or perhaps strtok() or — far better — POSIX strtok_r() or Windows strtok_s(), or non-standard strsep()) to split the line into fields at tabs. Note that strtok_r() et al don't care how many repeats there are of the delimiter (tabs in your case) between the fields; you can't have empty fields with them. You can identify empty fields with strcspn(), strpbrk() and strsep().


Cleaned up format

The format string has been revised to:

fscanf(fPtr,"%s\t%d\t%s\t%s\n", pObj->name, &pObj->age, pObj->country, pObj->profile);

This won't work, but can now be adapted so it will work.

if (fscanf(fPtr," %49[^\t]\t%d\t%24[^\t]\t%19[^\n]", pObj->name, &pObj->age, pObj->country, pObj->profile) != 4)
    …handle a format error…

Beware trailing white space in scanf() format strings. The leading blank skips any newline left over from previous lines, and skips any leading white space on a line. The %49[^\t] looks for up to 49 non-tabs; the tab is optional and matches any sequence of white space, but the first character will be a tab unless the name was too long. Then it reads a number, more optional white space (it doesn't have to be a tab, but it will be unless the data is malformatted), then up to 24 non-tabs, white space again (of which the first character will be a tab unless there's a formatting problem), and up to 19 non-tabs. The next character should be a newline, unless there's a formatting problem.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • `\the n` is a typing error. Now I edited that statement. My question is, reading the fields and load it to structure object as per the respective types. Each record of the file will have only 4 fields i.e., three `\t` characters. If all fields are filled, `fscanf` itself fill the structure fields associated by the type specifier format. If I use 'strtok` or some other string separation function, I definitly use functions like `atoi()` to load `age' of Person. I don't want to use that type conversion functions. – Smith Dwayne Nov 29 '18 at 01:24
  • A more precise format string, insisting on tab delimiters, would be: `" %49[^\t]%*[\t]%d%*[\t]%24[^\t]%*[\t]%19[^\n]"`; this replaces the free occurrences of `\t` with `%*[\t]` which looks for exactly a tab but the `*` prevents it being assigned to anything. It would be feasible to add `%*[\n]` at the end of the string to read the newline, but the blank at the start deals with leftover newlines. – Jonathan Leffler Nov 10 '22 at 16:13