1

I'm trying to read this data from a txt file and print it to the terminal:

1024|Shaggy Yanson|CELL|3048005191
1032|Pugsley Yanson|CELL|3048005191
1040|Beans Maulin|HOME|3649155831
1048|Banjo Codi|TBD|
1056|Lettuce Peas|WORK|7934346809
1064|Bullet Lemme|TBD|
1072|Nugget Smee|WORK|3886893085
1080|Bessie Lidgely|TBD|
1088|Potato Yards|HOME|5836788577

Instead of printing all of the rows, it stops at:

1048|Banjo Codi|TBD|

(the first row with missing data). I removed the rows with missing data and the else/if statement from my code and it worked fine...so I suspect it's the missing rows that's causing the issue.

This is my code:

#include <stdio.h>
#include <string.h>
int importPatients();

int main()
{
  importPatients();
  return 0;
}

int importPatients()
{
  FILE *fp = NULL;

  int patientNum;
  char firstName[31], lastName[31], phoneType[5], phone[11];

  fp = fopen("patientData.txt", "r+");

  if (fp != NULL)
  {
    while (!feof(fp))

    {
      if (fscanf(fp, "%d|%s %31[^|]|%5[^|]|%s", &patientNum, firstName, lastName, phoneType, phone) == 5)
      {
        printf("%d | %s %s | %s | %s\n", patientNum, firstName, lastName, phoneType, phone);
      }

      else if (fscanf(fp, "%d|%s %31[^|]|%4[^|]|", &patientNum, firstName, lastName, phoneType))
      {
        printf("%d | %s %s | %s |\n", patientNum, firstName, lastName, phoneType);
      }
    }
    fclose(fp);
  }
  return 0;
}

BTW, this is for an Intro to C class, so I can't use any special libraries besides string.h.

UPDATE: I switched to using fgets and sscanf. It works great!

chickpea
  • 79
  • 8
  • 3
    Couple things: (1) the *scanf family is notoriously poor at error handling, so dealing with missing data like this is inherently going to be difficult; (2) `while (!feof(fp))` [is wrong](https://stackoverflow.com/questions/5431941); (3) your first `fscanf` call correctly tests to make sure it received 5 values — why doesn't your second call do the same thing? – Steve Summit Apr 09 '23 at 03:29
  • 2
    `%31` should be `%30`, since the field width doesn't include the null terminator. – Barmar Apr 09 '23 at 03:32
  • 2
    Instead of using `fscanf()`, read each line of the file with `fgets()`, then use `sscanf()` on the buffer. – Barmar Apr 09 '23 at 03:32
  • @SteveSummit 1) i've heard but we're taught to use scanf for now so idk what else i'd use. 2) ok, i'll fix that 3) the second call doesn't do that because i was making a condition for the rows with missing data (so 4?) but i guess my knowledge/assumptions are incorrect – chickpea Apr 09 '23 at 03:33
  • 1
    But the biggest problem is: After the first call fails, it's already consumed all of the text, so the second call has no hope of reading it. – Steve Summit Apr 09 '23 at 03:33
  • 1
    The question here really does come down to: how much do you actually want/have to accomplish here? scanf *seems* simple, which is why introductory classes always use it first. For certain problems it *is* nice and simple, but there are things it can't do at all, so there's no point trying. My opinion is that *if* scanf is useful, it's only useful on perfect input. It's not reasonably possible to deal with variably-formatted or erroneous using scanf alone. So if you want to just use scanf, then eliminating the bad/difficult lines from the input file is (IMO) totally appropriate. – Steve Summit Apr 09 '23 at 03:39
  • 1
    @SteveSummit unfortunately, i'm not allowed to change the data. but thank you for your other suggestions; i will experiment with fgets and sscanf – chickpea Apr 09 '23 at 03:43

3 Answers3

0
  • Use fget() to read a line of file input into an adequate sized buffer. Separate reading from parsing.

  • When scanning, use the correct width: 1 less than buffer size.

  • Test for parsing success properly.

  • Use scanning result to steer code.

  • Never use while (!feof(fp)). Test input function return value.

  • Add more error checking.

//while (!feof(fp))
//  if (fscanf(fp, "%d|%s %31[^|]|%5[^|]|%s", &patientNum, firstName, lastName, phoneType, phone) == 5) {
//       printf("%d | %s %s | %s | %s\n", patientNum, firstName, lastName, phoneType, phone);
//    }
//    else if (fscanf(fp, "%d|%s %31[^|]|%4[^|]|", &patientNum, firstName, lastName, phoneType)) {
//      printf("%d | %s %s | %s |\n", patientNum, firstName, lastName, phoneType);
//    }
    
char buf[100];
while (fgets(buf, sizeof buf, fp)) {
  int count = sscanf(buf, "%d |%30s %30[^|]| %4[^|]|%10s",
    &patientNum, firstName, lastName, phoneType, phone);
  if (count < 4) {
    fprintf(stderr, "Bad input <%s>\n", buf);
    break;
  }
  if (count == 4) {
    phone[0] = '\0';
  }
  ...
}

Additional TBD improvements:

  • Look for extra junk on the line.

  • Handle very long lines.

  • Improve name parsing.

  • Validate phone numbers and allow longer ones.

  • Allow longer names. Example.

  • Drive width from coding constants.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

By the time you have read the stream and found it to contain too few fields, it is too late to re-read it into fewer fields without rewinding the fp, but that is unnecessary.

For example, replace:

if (fscanf(fp, "%d|%s %31[^|]|%5[^|]|%s", &patientNum, firstName, lastName, phoneType, phone) == 5)
{
    printf("%d | %s %s | %s | %s\n", patientNum, firstName, lastName, phoneType, phone);
}

else if (fscanf(fp, "%d|%s %31[^|]|%4[^|]|", &patientNum, firstName, lastName, phoneType))
{
    printf("%d | %s %s | %s |\n", patientNum, firstName, lastName, phoneType);
}

with:

char record[128] = "" ;
if( fgets( record, sizeof(record), fp ) == record )
{     
    int field_count = sscanf( record, "%d|%s %31[^|]|%5[^|]|%s", &patientNum, firstName, lastName, phoneType, phone) ;
    if( field_count < 5 )
    {
        strcpy( phone, "TBD" ) ;
    }

    if( field_count >= 4 )
    {
        printf("%d | %s %s | %s | %s\n", patientNum, firstName, lastName, phoneType, phone) ;
    }
}
Clifford
  • 88,407
  • 13
  • 85
  • 165
0

When the fscanf format string

"%d|%s %31[^|]|%5[^|]|%s"

is applied to the line

1048|Banjo Codi|TBD|

when the last %s specifier in the format string is reached, the entire line except for the newline character will have been matched. The %s specifier will first read and discard all leading whitespace characters. This means that the newline character at the end of the line will be discarded. It will then match 1056|Lettuce on the next line. This is not what you want.

One way to solve the problem would be to use the format string

"%d|%30s %30[^|]|%4[^|]|%10[^\n]"

instead. This will ensure that the newline character will not be discarded.

However, one problem with using the [] specifier is that it will not match an empty string. It will only match at least one character. This means that the entire conversion will fail at that point, if it cannot match at least one character.

Therefore, if you want to solve the problem with fscanf, you could determine whether matching the last specifier failed and, if that is the case, set that string to an empty string yourself (because scanf won't do that on a matching failure).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void importPatients( void );

int main( void )
{
    importPatients();
    return 0;
}

void importPatients( void )
{
    FILE *fp = NULL;
    int ret;

    int patientNum;
    char firstName[31], lastName[31], phoneType[5], phone[11];

    fp = fopen( "patientData.txt", "r+" );
    if ( fp == NULL )
    {
        fprintf( stderr, "Error opening file!\n" );
        exit( EXIT_FAILURE );
    }

    while (
        ( ret = fscanf(
            fp,
            "%d|%30s %30[^|]|%4[^|]|%10[^\n]",
            &patientNum, firstName, lastName, phoneType, phone
        ) ) >= 4
    )
    {
        //if the last conversion failed, set the last
        //argument to an empty string
        if ( ret == 4 )
        {
            phone[0] = '\0';
        }

        printf(
            "Successfully parsed a line with the following content:\n"
            "patientNum: %d\n"
            "firstName:  %s\n"
            "lastName:   %s\n"
            "phoneType:  %s\n"
            "phoneNum:   %s\n"
            "\n",
            patientNum, firstName, lastName, phoneType, phone
        );
    }

    fclose( fp );
}

For the input stated in the question, this program has the following output:

Successfully parsed a line with the following content:
patientNum: 1024
firstName:  Shaggy
lastName:   Yanson
phoneType:  CELL
phoneNum:   3048005191

Successfully parsed a line with the following content:
patientNum: 1032
firstName:  Pugsley
lastName:   Yanson
phoneType:  CELL
phoneNum:   3048005191

Successfully parsed a line with the following content:
patientNum: 1040
firstName:  Beans
lastName:   Maulin
phoneType:  HOME
phoneNum:   3649155831

Successfully parsed a line with the following content:
patientNum: 1048
firstName:  Banjo
lastName:   Codi
phoneType:  TBD
phoneNum:   

Successfully parsed a line with the following content:
patientNum: 1056
firstName:  Lettuce
lastName:   Peas
phoneType:  WORK
phoneNum:   7934346809

Successfully parsed a line with the following content:
patientNum: 1064
firstName:  Bullet
lastName:   Lemme
phoneType:  TBD
phoneNum:   

Successfully parsed a line with the following content:
patientNum: 1072
firstName:  Nugget
lastName:   Smee
phoneType:  WORK
phoneNum:   3886893085

Successfully parsed a line with the following content:
patientNum: 1080
firstName:  Bessie
lastName:   Lidgely
phoneType:  TBD
phoneNum:   

Successfully parsed a line with the following content:
patientNum: 1088
firstName:  Potato
lastName:   Yards
phoneType:  HOME
phoneNum:   5836788577

As you can see, the input data was mapped to the correct fields.

However, this is a rather messy solution. Also, this will only solve the problem if the %[] specifier at the end fails due to not matching at least one character, but it will not solve the problem of one of the previous %[] specifiers failing for that reason. For example, it will fail with the following line in the input:

1056|Lettuce Peas||7934346809

Generally, I do not recommend to solve such problems using fscanf. It is usually better to read one line at a time as a string using the function fgets, and then to treat each line individually. This would solve the problem of fscanf matching part of the next line.

The string containing the line input could then be parsed using the functions sscanf or strtok, but these functions have disadvantages, too:

Like fscanf, the function sscanf also has the problem of the %[] specifier only matching a single character (i.e. not an empty field), so that it will also fail to handle the following line of input:

1056|Lettuce Peas||7934346809

Splitting the string into individual fields using strtok has the disadvantage that it will jump over empty fields.

For this reason, I would use the function fgets, but I wouldn't use sscanf or strtok. Instead, I would do the parsing with my own custom code, as this gives me full flexibility:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <errno.h>
#include <limits.h>

//forward declarations
void importPatients( void );
bool read_field_as_string( const char **p_str, char delim, char *destination, int destination_size );
bool convert_patientNum_to_int( const char *str, int *result );
bool get_line_from_stream( char buffer[], int buffer_size, FILE *fp );

int main( void )
{
    importPatients();
    return 0;
}

void importPatients( void )
{
    FILE *fp = NULL;
    char line[1024];

    char patientNum[5], firstName[31], lastName[31], phoneType[5], phone[11];

    //associate variables and delimiters with the field numbers
    struct
    {
        char *str;
        int  capacity;
        char delim;

    } fields[] = {
        { patientNum, sizeof patientNum, '|' },
        { firstName, sizeof firstName, ' ' },
        { lastName, sizeof lastName, '|' },
        { phoneType, sizeof phoneType, '|' },
        { phone, sizeof phone, '\0' }
    };

    //open the file
    fp = fopen( "patientData.txt", "r" );
    if ( fp == NULL )
    {
        fprintf( stderr, "Error opening file!\n" );
        exit( EXIT_FAILURE );
    }

    //process one line per loop iteration
    while ( get_line_from_stream( line, sizeof line, fp ) )
    {
        const char *p = line;
        int patientNum_as_int;

        for ( int i = 0; i < (int)(sizeof fields/sizeof *fields); i++ )
        {
            if (
                !read_field_as_string(
                    &p,
                    fields[i].delim,
                    fields[i].str,
                    fields[i].capacity
                )
            )
            {
                //we cannot use "continue" here, because that would
                //continue the innermost loop, but we want to continue
                //the outer loop, so we must use goto
                goto invalid_input;
            }
        }

        //convert patientNum to integer
        if ( !convert_patientNum_to_int( patientNum, &patientNum_as_int ) )
            goto invalid_input;

        //print the result of the line
        printf(
            "Successfully parsed a line with the following content:\n"
            "patientNum: %d\n"
            "firstName:  %s\n"
            "lastName:   %s\n"
            "phoneType:  %s\n"
            "phoneNum:   %s\n"
            "\n",
            patientNum_as_int, firstName, lastName, phoneType, phone
        );

        //go to next loop iteration, do not execute the code behind
        //the goto label
        continue;

    invalid_input:
        printf( "Skipping line due to invalid input!\n\n" );
    }

    fclose( fp );
}

bool read_field_as_string( const char **p_str, char delim, char *destination, int destination_capacity )
{
    const char *p = *p_str, *q;
    int len;

    //find delimiter
    q = strchr( p, delim );
    if ( q == NULL )
        return false;

    //calculate length of field
    len = q - p;

    //make sure that destination is large enough
    if ( len >= destination_capacity )
        return false;

    //copy field to destination
    memcpy( destination, p, len );
    destination[len] = '\0';

    //update pointer
    *p_str = q + 1;

    //field was successfully extracted
    return true;
}

bool convert_patientNum_to_int( const char *str, int *result )
{
    char *p;
    unsigned long num;

    //verify that number consists entirely of digits
    for ( const char *q = str; *q != '\0'; q++ )
    {
        if ( !isdigit( (unsigned char)*q ) )
            return false;
    }

    //attempt to convert string to integer
    errno = 0;
    num = strtoul( str, &p, 10 );

    //verify that conversion was successful
    if ( p == str )
        return false;

    //verify that no range error occurred
    if ( errno == ERANGE )
        return false;

    //verify that result is representable as an "int"
    if ( num > INT_MAX )
        return false;

    //everything is ok, so write the result
    *result = num;
    return true;
}

//This function will read exactly one line of input and remove the
//newline character, if it exists. On success, it will return true.
//If this function is unable to read any further lines due to
//end-of-file, it will return false. If it fails for any other reason,
//it will not return, but will print an error message and call "exit"
//instead.
bool get_line_from_stream( char buffer[], int buffer_size, FILE *fp )
{
    char *p;

    //attempt to read one line from the stream
    if ( fgets( buffer, buffer_size, fp ) == NULL )
    {
        if ( !feof(fp) )
        {
            fprintf( stderr, "Input error!\n" );
            exit( EXIT_FAILURE );
        }

        return false;
    }

    //make sure that line was not too long for input buffer
    p = strchr( buffer, '\n' );
    if ( p == NULL )
    {
        //a missing newline character is ok if the next
        //character is a newline character or if we have
        //reached end-of-file
        if ( getc(fp) != '\n' && !feof(fp) )
        {
            fprintf( stderr, "Line is too long for memory buffer!\n" );
            exit( EXIT_FAILURE );
        }
    }
    else
    {
        //remove newline character by overwriting it with a null
        //character
        *p = '\0';
    }

    return true;
}

With the input in the question, this program has the same output as the other program. However, in contrast to the other program, it is also able to handle the input line

1056|Lettuce Peas||7934346809

properly. It has the following output for that line:

Successfully parsed a line with the following content:
patientNum: 1056
firstName:  Lettuce
lastName:   Peas
phoneType:  
phoneNum:   7934346809
Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39