When the fscanf
format string
"%d|%s %31[^|]|%5[^|]|%s"
is applied to the line
1048|Banjo Codi|TBD|
when the last %s
specifier in the format string is reached, the entire line except for the newline character will have been matched. The %s
specifier will first read and discard all leading whitespace characters. This means that the newline character at the end of the line will be discarded. It will then match 1056|Lettuce
on the next line. This is not what you want.
One way to solve the problem would be to use the format string
"%d|%30s %30[^|]|%4[^|]|%10[^\n]"
instead. This will ensure that the newline character will not be discarded.
However, one problem with using the []
specifier is that it will not match an empty string. It will only match at least one character. This means that the entire conversion will fail at that point, if it cannot match at least one character.
Therefore, if you want to solve the problem with fscanf
, you could determine whether matching the last specifier failed and, if that is the case, set that string to an empty string yourself (because scanf
won't do that on a matching failure).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void importPatients( void );
int main( void )
{
importPatients();
return 0;
}
void importPatients( void )
{
FILE *fp = NULL;
int ret;
int patientNum;
char firstName[31], lastName[31], phoneType[5], phone[11];
fp = fopen( "patientData.txt", "r+" );
if ( fp == NULL )
{
fprintf( stderr, "Error opening file!\n" );
exit( EXIT_FAILURE );
}
while (
( ret = fscanf(
fp,
"%d|%30s %30[^|]|%4[^|]|%10[^\n]",
&patientNum, firstName, lastName, phoneType, phone
) ) >= 4
)
{
//if the last conversion failed, set the last
//argument to an empty string
if ( ret == 4 )
{
phone[0] = '\0';
}
printf(
"Successfully parsed a line with the following content:\n"
"patientNum: %d\n"
"firstName: %s\n"
"lastName: %s\n"
"phoneType: %s\n"
"phoneNum: %s\n"
"\n",
patientNum, firstName, lastName, phoneType, phone
);
}
fclose( fp );
}
For the input stated in the question, this program has the following output:
Successfully parsed a line with the following content:
patientNum: 1024
firstName: Shaggy
lastName: Yanson
phoneType: CELL
phoneNum: 3048005191
Successfully parsed a line with the following content:
patientNum: 1032
firstName: Pugsley
lastName: Yanson
phoneType: CELL
phoneNum: 3048005191
Successfully parsed a line with the following content:
patientNum: 1040
firstName: Beans
lastName: Maulin
phoneType: HOME
phoneNum: 3649155831
Successfully parsed a line with the following content:
patientNum: 1048
firstName: Banjo
lastName: Codi
phoneType: TBD
phoneNum:
Successfully parsed a line with the following content:
patientNum: 1056
firstName: Lettuce
lastName: Peas
phoneType: WORK
phoneNum: 7934346809
Successfully parsed a line with the following content:
patientNum: 1064
firstName: Bullet
lastName: Lemme
phoneType: TBD
phoneNum:
Successfully parsed a line with the following content:
patientNum: 1072
firstName: Nugget
lastName: Smee
phoneType: WORK
phoneNum: 3886893085
Successfully parsed a line with the following content:
patientNum: 1080
firstName: Bessie
lastName: Lidgely
phoneType: TBD
phoneNum:
Successfully parsed a line with the following content:
patientNum: 1088
firstName: Potato
lastName: Yards
phoneType: HOME
phoneNum: 5836788577
As you can see, the input data was mapped to the correct fields.
However, this is a rather messy solution. Also, this will only solve the problem if the %[]
specifier at the end fails due to not matching at least one character, but it will not solve the problem of one of the previous %[]
specifiers failing for that reason. For example, it will fail with the following line in the input:
1056|Lettuce Peas||7934346809
Generally, I do not recommend to solve such problems using fscanf
. It is usually better to read one line at a time as a string using the function fgets
, and then to treat each line individually. This would solve the problem of fscanf
matching part of the next line.
The string containing the line input could then be parsed using the functions sscanf
or strtok
, but these functions have disadvantages, too:
Like fscanf
, the function sscanf
also has the problem of the %[]
specifier only matching a single character (i.e. not an empty field), so that it will also fail to handle the following line of input:
1056|Lettuce Peas||7934346809
Splitting the string into individual fields using strtok
has the disadvantage that it will jump over empty fields.
For this reason, I would use the function fgets
, but I wouldn't use sscanf
or strtok
. Instead, I would do the parsing with my own custom code, as this gives me full flexibility:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <errno.h>
#include <limits.h>
//forward declarations
void importPatients( void );
bool read_field_as_string( const char **p_str, char delim, char *destination, int destination_size );
bool convert_patientNum_to_int( const char *str, int *result );
bool get_line_from_stream( char buffer[], int buffer_size, FILE *fp );
int main( void )
{
importPatients();
return 0;
}
void importPatients( void )
{
FILE *fp = NULL;
char line[1024];
char patientNum[5], firstName[31], lastName[31], phoneType[5], phone[11];
//associate variables and delimiters with the field numbers
struct
{
char *str;
int capacity;
char delim;
} fields[] = {
{ patientNum, sizeof patientNum, '|' },
{ firstName, sizeof firstName, ' ' },
{ lastName, sizeof lastName, '|' },
{ phoneType, sizeof phoneType, '|' },
{ phone, sizeof phone, '\0' }
};
//open the file
fp = fopen( "patientData.txt", "r" );
if ( fp == NULL )
{
fprintf( stderr, "Error opening file!\n" );
exit( EXIT_FAILURE );
}
//process one line per loop iteration
while ( get_line_from_stream( line, sizeof line, fp ) )
{
const char *p = line;
int patientNum_as_int;
for ( int i = 0; i < (int)(sizeof fields/sizeof *fields); i++ )
{
if (
!read_field_as_string(
&p,
fields[i].delim,
fields[i].str,
fields[i].capacity
)
)
{
//we cannot use "continue" here, because that would
//continue the innermost loop, but we want to continue
//the outer loop, so we must use goto
goto invalid_input;
}
}
//convert patientNum to integer
if ( !convert_patientNum_to_int( patientNum, &patientNum_as_int ) )
goto invalid_input;
//print the result of the line
printf(
"Successfully parsed a line with the following content:\n"
"patientNum: %d\n"
"firstName: %s\n"
"lastName: %s\n"
"phoneType: %s\n"
"phoneNum: %s\n"
"\n",
patientNum_as_int, firstName, lastName, phoneType, phone
);
//go to next loop iteration, do not execute the code behind
//the goto label
continue;
invalid_input:
printf( "Skipping line due to invalid input!\n\n" );
}
fclose( fp );
}
bool read_field_as_string( const char **p_str, char delim, char *destination, int destination_capacity )
{
const char *p = *p_str, *q;
int len;
//find delimiter
q = strchr( p, delim );
if ( q == NULL )
return false;
//calculate length of field
len = q - p;
//make sure that destination is large enough
if ( len >= destination_capacity )
return false;
//copy field to destination
memcpy( destination, p, len );
destination[len] = '\0';
//update pointer
*p_str = q + 1;
//field was successfully extracted
return true;
}
bool convert_patientNum_to_int( const char *str, int *result )
{
char *p;
unsigned long num;
//verify that number consists entirely of digits
for ( const char *q = str; *q != '\0'; q++ )
{
if ( !isdigit( (unsigned char)*q ) )
return false;
}
//attempt to convert string to integer
errno = 0;
num = strtoul( str, &p, 10 );
//verify that conversion was successful
if ( p == str )
return false;
//verify that no range error occurred
if ( errno == ERANGE )
return false;
//verify that result is representable as an "int"
if ( num > INT_MAX )
return false;
//everything is ok, so write the result
*result = num;
return true;
}
//This function will read exactly one line of input and remove the
//newline character, if it exists. On success, it will return true.
//If this function is unable to read any further lines due to
//end-of-file, it will return false. If it fails for any other reason,
//it will not return, but will print an error message and call "exit"
//instead.
bool get_line_from_stream( char buffer[], int buffer_size, FILE *fp )
{
char *p;
//attempt to read one line from the stream
if ( fgets( buffer, buffer_size, fp ) == NULL )
{
if ( !feof(fp) )
{
fprintf( stderr, "Input error!\n" );
exit( EXIT_FAILURE );
}
return false;
}
//make sure that line was not too long for input buffer
p = strchr( buffer, '\n' );
if ( p == NULL )
{
//a missing newline character is ok if the next
//character is a newline character or if we have
//reached end-of-file
if ( getc(fp) != '\n' && !feof(fp) )
{
fprintf( stderr, "Line is too long for memory buffer!\n" );
exit( EXIT_FAILURE );
}
}
else
{
//remove newline character by overwriting it with a null
//character
*p = '\0';
}
return true;
}
With the input in the question, this program has the same output as the other program. However, in contrast to the other program, it is also able to handle the input line
1056|Lettuce Peas||7934346809
properly. It has the following output for that line:
Successfully parsed a line with the following content:
patientNum: 1056
firstName: Lettuce
lastName: Peas
phoneType:
phoneNum: 7934346809