1

I have sample input file like this

1344 Muhammad Ayyubi 1
1344 Muhammad Ali Ayyubi 1

First, last number and surname are separated with tab character. However, a person may have two names. In that case, names are separated with whitespace.

I am trying to read from input file and store them in related variables.

Here is my code that successfully reads when a person has only one name.

fscanf(fp, "%d\t%s\t%s\t%d", &id, firstname, surname, &roomno)

The question is that is there any way to read the input file which may contain two first names.

Thanks in advance.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
blackwings15
  • 55
  • 1
  • 5
  • 1
    Read each line with `fgets()` and apply `strtok()` to the string, to extract each part. I see from the formatting it is possible to distinguish say "Betty Jo Anderson" (2 + 1) from "Peter Da Silva" ( 1 + 2) and "Sally Ann De Vries" (2 + 2). – Weather Vane Mar 14 '22 at 13:48

3 Answers3

2

Read the line with fgets() which then saves that as a string.

Then parse the string. Save into adequate sized buffers.

Scanning with "\t", scans any number of white-space - zero or more. Use TABFMT below to scan 1 tab character.

Test results along the way.

This code uses " %n" to see that parsing reached that point and nothing more on the line.

#define LINE_N 100
char line[LINE_N];
int id, 
char firstname[LINE_N];
char surname[LINE_N];
int roomno;

if (fgets(line, sizeof line, fp)) {
  int n = 0;
  #define TABFMT "%*1[\t]"
  #define NAMEFMT "%[^\t]"
  sscanf(line, "%d" TABFMT NAMEFMT TABFMT NAMEFMT TABFMT "%d %n", 
      &id, firstname, surname, &roomno, &n);
  if (n == 0 || line[n]) {
    fprintf(stderr, "Failed to parse <%s>\n", line);
  } else {
    printf("Success: %d <%s> <%s> %d\n", id, firstname, surname, roomno);
  }
}

If the last name or first is empty, this code treats that as an error.

Alternate approach would read the line into a string and then use strcspn(), strchr() or strtok() to look for tabs to parse into the 4 sub-strings`.


The larger issue missed by OP is what to do about ill-formatted input? Error handling is often dismissed with "input will be well formed", yet in real life, bad input does happen and also is the crack the hackers look for. Defensive coding takes steps to validate input. Pedantic code would not use *scanf() at all, but instead fgets(), strcspn(), strspn(), strchr(), strtol() and test, test, test. This answer is a middle-of-the-road testing effort.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

You can use the %[ specifier to read whitespace in a string:

fscanf(fp, "%d\t%[^\t]\t%[^\t]\t%d", &id, firstname, surname, &roomno)

L. Scott Johnson
  • 4,213
  • 2
  • 17
  • 28
  • Perhaps `fscanf(fp, "%d %[^\t] %[^\t]%d", &id, firstname, surname, &roomno)` – Weather Vane Mar 14 '22 at 13:53
  • Yeah, I edited that in. I was just focussing on the "two first names" qualifer of the question with the initial answer. Thanks. – L. Scott Johnson Mar 14 '22 at 13:54
  • [^\t] didn't work. There are more than 4 variables to read. I've put 4 variables just to cut it short. Real input file header is below – blackwings15 Mar 14 '22 at 13:58
  • 1
    This should work, please ask another question, with the new intended code to include the full problem. – Weather Vane Mar 14 '22 at 14:01
  • @blackwings15 then adjust the pattern to match your input specs. If you need more strings, use more. If you need a different character set in the brackets, use a different character set, etc. – L. Scott Johnson Mar 14 '22 at 14:05
0

The answers to the question as stated are reasonable, but the question is wrong.

The end-goal here is to read human-names. Human names come in quite a variety - not always first, [middle,] last. Baking in this assumption is an error in design.

This is a many, many times repeated error. Better not to repeat.

Simplest solution is to re-order the data fields, and make no assumptions about the structure of names. So the input data becomes:

1344 1 Muhammad Ayyubi
1344 1 Muhammad Ali Ayyubi

Scanning code then can pull off the first two numeric fields, and use the remainder of the line for name (making no assumptions about structure).

More generally, if you do need to scan fields with embedded whitespace, remember the 32 "control" characters in the ASCII character table, of which ~24 have no assigned semantics (in current use). You can add structure to a file of text, for example with use of (from man ascii:

034   28    1C    FS  (file separator)        
035   29    1D    GS  (group separator)       
036   30    1E    RS  (record separator)      
037   31    1F    US  (unit separator)        

There is almost no case where text fields are allowed these characters.

  • OP's format use `'\t'` as a separator. It is reasonable to use that as it doesn't [usually](https://xkcd.com/327/) appear in names. Re-ordering make no difference with tab separated data. "if you do need to scan fields with embedded whitespace" doesn't well apply as names often have space in them as a "first", "middle" or family name. True that name parsing is hard like [long names](https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff_Sr.) & [internationalization](https://stackoverflow.com/questions/421046/what-are-all-of-the-allowable-characters-for-peoples-names). – chux - Reinstate Monica Mar 15 '22 at 14:25