2

I have an input file called animals.dat which has data in the form like:

1,Allegra,Pseudois nayaur,S,5 
2,unknown,Ailurus fulgens,X,10
3,Athena,Moschus fuscus,X,2

The code I have been using to store and process my data is this. But, for some reason, it seems to get stuck in an infinite loop. Any suggestions on how to make this correct/better?

void choice3(FILE *infile) {
    int id;
    printf("Enter ID ");
    scanf("%d", &id);
    while(!feof(infile)) { 
        int animalID;
        char animalName[20];
        char animalType[20];
        char animalSize;
        int animalAge;
        fscanf(infile,"%d,",&animalID);
        fscanf(infile,"%[^,] ",animalName);
        fscanf(infile,"%s, %c, %d",animalType,&animalSize,&animalAge);
        if(animalID == id) {
            printf("Animal Found");  
        }
    }
    rewind(infile);

EDIT:

This is the link to the exact binary file which I have to take as input. https://drive.google.com/open?id=18olXBhRgpGyY0bhpjDSwla2XcBnWoFGM

And, the instructions I have for this part says "All animals are listed in the increasing order by their id number, starting at value 1. If there is a hole in id numbers, e.g. 2, then the structure information is still present in the file, except the name component contains the string "unknown" to signify an empty record. Make sure your search uses random file processing. If an invalid id is entered, in this example any value other than 1 or 3, your program is to display an error message. Otherwise, animal record is displayed. In either case, the program is to go back to the initial menu."

The code I have updated to this as it is in given order.

void choice3(FILE *infile) {
    Animal tempAnimal;
    int id;
    printf("Enter ID ");
    scanf(" %d", &id);
    fseek(infile,id * sizeof(struct animal),SEEK_SET);
    fread(&tempAnimal,sizeof(struct animal),1,infile);
    printf("%d -- %s\n",tempAnimal->id,tempAnimal->name);
}

I have defined the structure in another file animal.h which I am including.

struct animal {
    short int id;
    char name[20];
    char species[35];
    char size;
    short int age;
};
typedef struct animal* Animal;

But, for some reason now I am getting a "Segmentation fault: 11". Which means it's not working on my fread() line. Any suggestions?

Joe Taras
  • 15,166
  • 7
  • 42
  • 55
kbreezy
  • 19
  • 2
  • 1
    https://stackoverflow.com/q/5431941/758133 – Martin James Dec 03 '17 at 06:38
  • Isn't this just the CSV file format? – Udayraj Deshmukh Dec 03 '17 at 06:38
  • Debugger............. – Martin James Dec 03 '17 at 06:38
  • Use fgets() and strtok - it's easier. – Martin James Dec 03 '17 at 06:41
  • Are you sure that you are in an infinite loop? I think you are not passing the first line, since you are not dealing with the end of line. Anyway, not sure with your Code, brcause i'm on mobile and i cannot ser the closing } for the if. – fernand0 Dec 03 '17 at 06:44
  • Yes, it's a csv format but I am not completely sure how do i read strings from those. And, yes it definitely stuck in the loop as I commented out everything and using printf statements I saw that it never comes out of the loop. – kbreezy Dec 03 '17 at 06:50
  • 3
    Note that [`while (!feof(file))` is always wrong](https://stackoverflow.com/questions/5431941/) — which was referenced by Martin James. You must check the return value from each `fscanf()` to know whether any of them are working. You will probably do best reading a line and then processing — possibly with `sscanf()`. Handling this with `fscanf()` is tricky, at best. – Jonathan Leffler Dec 03 '17 at 07:15
  • 1
    Unless your data file is prohibitively long, you should only be reading it once and storing all animal data in an array of struct so that repeated access of the data is from memory not from a file (which will be several *orders of magnitude* slower) – David C. Rankin Dec 03 '17 at 07:23

2 Answers2

3

This code works for me — I've made it into what is close to an MCVE (Minimal, Complete, Verifiable Example:

#include <stdio.h>

static void choice3(FILE *infile, int id)
{
    int animalID = -37;
    char animalName[20];
    char animalType[20];
    char animalSize;
    int animalAge;
    while (fscanf(infile, "%d , %19[^,] , %19[^,] , %c , %d",
                  &animalID, animalName, animalType, &animalSize, &animalAge) == 5)
    {
        printf("Read: %d: %s, %s, %c, %d\n",
               animalID, animalName, animalType, animalSize, animalAge);
        if (animalID == id)
        {
            printf("Animal Found: %d: %s, %s, %c, %d\n",
                   animalID, animalName, animalType, animalSize, animalAge);
        }
    }
    if (feof(infile))
        printf("EOF\n");
    else
        printf("Format error\n");
}

int main(void)
{
    choice3(stdin, 3);
    return 0;
}

It hard-wires the desired animal ID at 3, and reads from standard input, so I ran the program (csv47) on your data file (data) and got:

$ ./csv47 < data
Read: 1: Allegra, Pseudois nayaur, S, 5
Read: 2: unknown, Ailurus fulgens, X, 10
Read: 3: Athena, Moschus fuscus, X, 2
Animal Found: 3: Athena, Moschus fuscus, X, 2
EOF
$

Not all the spaces in the fscanf() format string are necessary; none is harmful. Note that the code checks for the correct number of fields and exits the loop. Note that the data is printed so that it clear what was read — that's a basic debugging technique. The test after the loop is a correct use of feof(); using feof() to control the loop is almost invariably wrong.

You would do better to use a line reading function (fgets() or POSIX getline(), for example) to read a line of data, and then you could print, scan, rescan, report on the line that caused trouble. This generally leads to better error reporting, if only because you have the entire line available, rather than whatever fragment is left over after fscanf() has read some but not all the fields.

Note too that this won't cope with fields enclosed in double quotes containing commas, or some other standard CSV conventions. Those really require library code to handle the reading.

Finally, this edit concerns itself only with reading the data into the local variables and avoiding 'infinite loops'. For a discussion of storage, see David C. Rankin's answer.


animals.dat from Google Drive

The animals.dat file that was available from Google Drive at 2017-12-03 19:00:00 -08:00 was a binary file written with little-endian integers (Intel machines) using the 60-byte structure outlined in the question (and used in the printing code below). Just in case it isn't available, the following is the output from xxd -i animals.dat — a C array definition that contains the same data:

unsigned char animals_dat[] = {
  0x01, 0x00, 0x41, 0x62, 0x69, 0x67, 0x61, 0x69, 0x6c, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x43, 0x61,
  0x70, 0x72, 0x69, 0x63, 0x6f, 0x72, 0x6e, 0x69, 0x73, 0x20, 0x73, 0x75,
  0x6d, 0x61, 0x74, 0x72, 0x61, 0x65, 0x6e, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x53, 0x08, 0x00,
  0x02, 0x00, 0x75, 0x6e, 0x6b, 0x6e, 0x6f, 0x77, 0x6e, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x4f, 0x72,
  0x79, 0x78, 0x20, 0x6c, 0x65, 0x75, 0x63, 0x6f, 0x72, 0x79, 0x78, 0x00,
  0x6d, 0x61, 0x74, 0x72, 0x61, 0x65, 0x6e, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x4d, 0x0c, 0x00,
  0x03, 0x00, 0x41, 0x64, 0x72, 0x69, 0x61, 0x6e, 0x00, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x43, 0x65,
  0x70, 0x68, 0x61, 0x6c, 0x6f, 0x70, 0x68, 0x75, 0x73, 0x20, 0x64, 0x6f,
  0x72, 0x73, 0x61, 0x6c, 0x69, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x4c, 0x10, 0x00,
  0x04, 0x00, 0x41, 0x68, 0x6d, 0x65, 0x64, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x4e, 0x61,
  0x65, 0x6d, 0x6f, 0x72, 0x68, 0x65, 0x64, 0x75, 0x73, 0x20, 0x67, 0x72,
  0x69, 0x73, 0x65, 0x75, 0x73, 0x00, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x4c, 0x0a, 0x00,
  0x05, 0x00, 0x41, 0x69, 0x64, 0x61, 0x6e, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x4e, 0x61,
  0x65, 0x6d, 0x6f, 0x72, 0x68, 0x65, 0x64, 0x75, 0x73, 0x20, 0x63, 0x61,
  0x75, 0x64, 0x61, 0x74, 0x75, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x58, 0x09, 0x00,
  0x06, 0x00, 0x41, 0x6c, 0x6c, 0x65, 0x67, 0x72, 0x61, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x50, 0x73,
  0x65, 0x75, 0x64, 0x6f, 0x69, 0x73, 0x20, 0x6e, 0x61, 0x79, 0x61, 0x75,
  0x72, 0x00, 0x61, 0x74, 0x75, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x53, 0x05, 0x00,
  0x07, 0x00, 0x41, 0x6d, 0x65, 0x6c, 0x61, 0x00, 0x61, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x43, 0x65,
  0x72, 0x64, 0x6f, 0x63, 0x79, 0x6f, 0x6e, 0x20, 0x74, 0x68, 0x6f, 0x75,
  0x73, 0x00, 0x61, 0x74, 0x75, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x4d, 0x0b, 0x00,
  0x08, 0x00, 0x75, 0x6e, 0x6b, 0x6e, 0x6f, 0x77, 0x6e, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x43, 0x61,
  0x70, 0x72, 0x61, 0x20, 0x66, 0x61, 0x6c, 0x63, 0x6f, 0x6e, 0x65, 0x72,
  0x69, 0x00, 0x61, 0x74, 0x75, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x4d, 0x01, 0x00,
  0x09, 0x00, 0x41, 0x6e, 0x6a, 0x6f, 0x6c, 0x69, 0x65, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x41, 0x69,
  0x6c, 0x75, 0x72, 0x75, 0x73, 0x20, 0x66, 0x75, 0x6c, 0x67, 0x65, 0x6e,
  0x73, 0x00, 0x61, 0x74, 0x75, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x4c, 0x0a, 0x00,
  0x0a, 0x00, 0x41, 0x74, 0x68, 0x65, 0x6e, 0x61, 0x00, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x4d, 0x6f,
  0x73, 0x63, 0x68, 0x75, 0x73, 0x20, 0x66, 0x75, 0x73, 0x63, 0x75, 0x73,
  0x00, 0x00, 0x61, 0x74, 0x75, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x53, 0x05, 0x00,
  0x0b, 0x00, 0x41, 0x76, 0x61, 0x00, 0x6e, 0x61, 0x00, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x43, 0x65,
  0x70, 0x68, 0x61, 0x6c, 0x6f, 0x70, 0x68, 0x75, 0x73, 0x20, 0x6a, 0x65,
  0x6e, 0x74, 0x69, 0x6e, 0x6b, 0x69, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x4d, 0x0d, 0x00,
  0x0c, 0x00, 0x41, 0x78, 0x65, 0x6c, 0x00, 0x61, 0x00, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x48, 0x69,
  0x70, 0x70, 0x6f, 0x63, 0x61, 0x6d, 0x65, 0x6c, 0x75, 0x73, 0x20, 0x61,
  0x6e, 0x74, 0x69, 0x73, 0x65, 0x6e, 0x73, 0x69, 0x73, 0x00, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x4d, 0x0b, 0x00,
  0x0d, 0x00, 0x41, 0x79, 0x61, 0x6e, 0x6e, 0x61, 0x00, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x47, 0x61,
  0x7a, 0x65, 0x6c, 0x6c, 0x61, 0x20, 0x63, 0x75, 0x76, 0x69, 0x65, 0x72,
  0x69, 0x00, 0x69, 0x73, 0x65, 0x6e, 0x73, 0x69, 0x73, 0x00, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x53, 0x0c, 0x00,
  0x0e, 0x00, 0x42, 0x72, 0x61, 0x64, 0x6c, 0x65, 0x79, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x42, 0x75,
  0x62, 0x61, 0x6c, 0x75, 0x73, 0x20, 0x6d, 0x69, 0x6e, 0x64, 0x6f, 0x72,
  0x65, 0x6e, 0x73, 0x69, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x58, 0x04, 0x00,
  0x0f, 0x00, 0x42, 0x72, 0x65, 0x6e, 0x64, 0x61, 0x6e, 0x00, 0x00, 0x00,
  0x04, 0x00, 0x00, 0x00, 0x74, 0x01, 0x00, 0x00, 0x74, 0x01, 0x42, 0x6f,
  0x73, 0x20, 0x67, 0x61, 0x75, 0x72, 0x75, 0x73, 0x00, 0x64, 0x6f, 0x72,
  0x65, 0x6e, 0x73, 0x69, 0x73, 0x00, 0x73, 0x69, 0x73, 0x00, 0x00, 0x00,
  0x5c, 0x3f, 0x1b, 0x00, 0x5c, 0x4f, 0x1b, 0x00, 0x5c, 0x58, 0x01, 0x00
};
unsigned int animals_dat_len = 900;

Code to read binary animals.dat

As I noted in a comment below, the data file leaks information because after each name there is junk data from the previous record. This is one of those occasions where the null-padding behaviour of strncpy() actually becomes useful; it zaps the extraneous data from previous records with null bytes, but this was signally not done when animals.dat was generated.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

struct animal {
    short int id;
    char name[20];
    char species[35];
    char size;
    short int age;
};

static void debris_field(const char *tag, const char *field, size_t length)
{
    size_t nomlen = strlen(field);
    int count = 0;
    for (size_t i = nomlen; i < length; i++)
    {
        if (field[i] != '\0')
        {
            if (count == 0)
                printf("%8s (%2zu = %-20s) has debris:\n        ", tag, nomlen, field);
            count++;
            unsigned char u = field[i];
            if (isprint(u))
                putchar(u);
            else
                printf("\\x%.2X", u);
        }
    }
    if (count != 0)
        putchar('\n');
}

static void report_debris(const struct animal *info)
{
    debris_field("name", info->name, sizeof(info->name));
    debris_field("species", info->species, sizeof(info->species));
}

static void choice2(FILE *infile, int noisy)
{
    struct animal info;
    while (fread(&info, sizeof(info), 1, infile) == 1)
    {
        if (strcmp(info.name, "unknown") == 0)
        {
            printf("Deleted: %2d %20s %30s %c %2d\n", info.id, info.name, info.species, info.size, info.age);
        }
        else
        {
            printf("Current: %2d %20s %30s %c %2d\n", info.id, info.name, info.species, info.size, info.age);
        }
        if (noisy)
            report_debris(&info);
    }
}

int main(int argc, char **argv)
{
    int noisy = 0;
    if (argc > 1 && argv[argc] == 0)    // Use argv
        noisy = 1;
    choice2(stdin, noisy);
    return 0;
}

The 'use argv' comment is relevant because I compile with GCC 7.2.0 on a MacBook Pro running macOS High Sierra 10.13.1 with the command line (source in animals59.c):

$ gcc -O3 -g -std=c11 -Wall -Wextra -Werror -Wmissing-prototypes \
>     -Wstrict-prototypes animals59.c -o animals59
$

If the code didn't use argv somehow, the compiler would complain and the code wouldn't compile.

Output - no arguments

Current:  1              Abigail       Capricornis sumatraensis S  8
Deleted:  2              unknown                  Oryx leucoryx M 12
Current:  3               Adrian           Cephalophus dorsalis L 16
Current:  4                Ahmed            Naemorhedus griseus L 10
Current:  5                Aidan           Naemorhedus caudatus X  9
Current:  6              Allegra                Pseudois nayaur S  5
Current:  7                Amela                Cerdocyon thous M 11
Deleted:  8              unknown                Capra falconeri M  1
Current:  9              Anjolie                Ailurus fulgens L 10
Current: 10               Athena                 Moschus fuscus S  5
Current: 11                  Ava           Cephalophus jentinki M 13
Current: 12                 Axel        Hippocamelus antisensis M 11
Current: 13               Ayanna                Gazella cuvieri S 12
Current: 14              Bradley            Bubalus mindorensis X  4
Current: 15              Brendan                     Bos gaurus X  1

Output — with argument

Current:  1              Abigail       Capricornis sumatraensis S  8
    name ( 7 = Abigail             ) has debris:
        \x04t\x01t\x01
 species (24 = Capricornis sumatraensis) has debris:
        \?\x1B\O\x1B\
Deleted:  2              unknown                  Oryx leucoryx M 12
    name ( 7 = unknown             ) has debris:
        \x04t\x01t\x01
 species (13 = Oryx leucoryx       ) has debris:
        matraensis\?\x1B\O\x1B\
Current:  3               Adrian           Cephalophus dorsalis L 16
    name ( 6 = Adrian              ) has debris:
        \x04t\x01t\x01
 species (20 = Cephalophus dorsalis) has debris:
        sis\?\x1B\O\x1B\
Current:  4                Ahmed            Naemorhedus griseus L 10
    name ( 5 = Ahmed               ) has debris:
        \x04t\x01t\x01
 species (19 = Naemorhedus griseus ) has debris:
        sis\?\x1B\O\x1B\
Current:  5                Aidan           Naemorhedus caudatus X  9
    name ( 5 = Aidan               ) has debris:
        \x04t\x01t\x01
 species (20 = Naemorhedus caudatus) has debris:
        sis\?\x1B\O\x1B\
Current:  6              Allegra                Pseudois nayaur S  5
    name ( 7 = Allegra             ) has debris:
        \x04t\x01t\x01
 species (15 = Pseudois nayaur     ) has debris:
        atussis\?\x1B\O\x1B\
Current:  7                Amela                Cerdocyon thous M 11
    name ( 5 = Amela               ) has debris:
        a\x04t\x01t\x01
 species (15 = Cerdocyon thous     ) has debris:
        atussis\?\x1B\O\x1B\
Deleted:  8              unknown                Capra falconeri M  1
    name ( 7 = unknown             ) has debris:
        \x04t\x01t\x01
 species (15 = Capra falconeri     ) has debris:
        atussis\?\x1B\O\x1B\
Current:  9              Anjolie                Ailurus fulgens L 10
    name ( 7 = Anjolie             ) has debris:
        \x04t\x01t\x01
 species (15 = Ailurus fulgens     ) has debris:
        atussis\?\x1B\O\x1B\
Current: 10               Athena                 Moschus fuscus S  5
    name ( 6 = Athena              ) has debris:
        \x04t\x01t\x01
 species (14 = Moschus fuscus      ) has debris:
        atussis\?\x1B\O\x1B\
Current: 11                  Ava           Cephalophus jentinki M 13
    name ( 3 = Ava                 ) has debris:
        na\x04t\x01t\x01
 species (20 = Cephalophus jentinki) has debris:
        sis\?\x1B\O\x1B\
Current: 12                 Axel        Hippocamelus antisensis M 11
    name ( 4 = Axel                ) has debris:
        a\x04t\x01t\x01
 species (23 = Hippocamelus antisensis) has debris:
        \?\x1B\O\x1B\
Current: 13               Ayanna                Gazella cuvieri S 12
    name ( 6 = Ayanna              ) has debris:
        \x04t\x01t\x01
 species (15 = Gazella cuvieri     ) has debris:
        isensis\?\x1B\O\x1B\
Current: 14              Bradley            Bubalus mindorensis X  4
    name ( 7 = Bradley             ) has debris:
        \x04t\x01t\x01
 species (19 = Bubalus mindorensis ) has debris:
        sis\?\x1B\O\x1B\
Current: 15              Brendan                     Bos gaurus X  1
    name ( 7 = Brendan             ) has debris:
        \x04t\x01t\x01
 species (10 = Bos gaurus          ) has debris:
        dorensissis\?\x1B\O\x1B\
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Hey, I tried a very similar snippet of code with exact same logic. But, for some reason now I am unable to go in the loop. It is not entering the while loop at all. – kbreezy Dec 03 '17 at 11:56
  • There are two main options: (1) what you used wasn’t quite similar enough to what I showed, or (2) the data wasn’t quite as expected. I suggest going for the line-based input plus `sscanf()` because you can print out the data. Also, capture and print the return code from `sscanf()` because that tells us where the problem was in the line. – Jonathan Leffler Dec 03 '17 at 12:49
  • So I see. See [Is it a good idea to typedef pointers?](https://stackoverflow.com/questions/750178/) — a hint: the short answer is No; and the long answer is No unless they're function pointers. That's most of your trouble. You've allocated a pointer; you've never initialized it. The data file is a little-endian (Intel) binary file with records of 60 bytes each, that match the `struct animal` type you use. I think there's some junk in the trailing sections of the two strings; that doesn't matter very much, though. – Jonathan Leffler Dec 04 '17 at 03:20
  • Note that because `fread()` takes a `void *` first argument, the compiler can't help you much spotting that you're passing the address of an uninitialized pointer to the function and reading as much data as the structure holds (60 bytes) into the space that a pointer holds (4 or 8 bytes) — and therefore writing after it, trampling your stack and the return information, etc. A crash is not surprising. – Jonathan Leffler Dec 04 '17 at 03:22
  • Hmm but I am still a little confused on how to go about correcting it? Any suggestions there? – kbreezy Dec 04 '17 at 03:56
  • See my updated answer. Short form: don't use `Animal tempAnimal` as the type in your function — use `struct animal tempAnimal;`. You then need to use `tempAnimal.name` instead of `tempAnimal->name`, of course. Or use `typedef struct animal Animal;` — but you'll still need to use dots `.` instead of arrows `->`. – Jonathan Leffler Dec 04 '17 at 04:14
2

One way to simplify the coordination of several pieces of information that need to be considered as a distinct group (such as an animal id, name, type, size and age) is to capture those pieces of information as a struct. You can use an array of struct to capture the information on all animals. This will simplify your data collection and allow you to hold all values in memory for querying, etc.

To read the varying pieces of data, notice that all the data for a single animal is on one line. That should point you toward a line oriented input function such as fgets of POSIX getline. Once you read a line of data, you can then parse the values you need from it. (with e.g. sscanf or walking a pointer down the line) This provides the benefits of (1) validating the read separate and apart from (2) validating the individual values parsed from the line (as well as consuming the trailing '\n' before the next read).

Putting that together, you could simply handle reading your animal data into an array of animals with something similar to the following:

#include <stdio.h>

/* consts for max name/type, animals array, characters for buf */
enum { MAXNT = 20, MAXA = 128, MAXC = 512 };

typedef struct {
    int id,
        age;
    char name[MAXNT],
        type[MAXNT],
        size;
} animal;

int main (int argc, char **argv) {

    int n = 0;                              /* array index */
    char buf[MAXC] = "";                    /* line buffer */
    animal animals[MAXA] = {{ .id = 0 }};   /* animals array */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    while (n < MAXA && fgets (buf, MAXC, fp)) { /* read each line */
        /* parse animal data from line and validate conversion */
        if (sscanf (buf, "%d, %19[^,], %19[^,], %c, %d",
                    &animals[n].id, animals[n].name, animals[n].type, 
                    &animals[n].size, &animals[n].age) == 5)
            n++;    /* increment array index on successful conversion */
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    /* do what you need with data (printing here) */
    for (int i = 0; i < n; i++)
        printf ("%3d    %-20s  %-20s  %c  %3d\n", 
                animals[i].id, animals[i].name, animals[i].type, 
                animals[i].size, animals[i].age);

    return 0;
}

Example Use/Output

$ ./bin/animals <dat/animals.dat
  1    Allegra               Pseudois nayaur       S    5
  2    unknown               Ailurus fulgens       X   10
  3    Athena                Moschus fuscus        X    2

Reading Binary 'animals.dat'

When the input file format changes significantly, as well as the size of names and types, that significantly changes the approach to the problem. Without duplicating what Mr. Leffler has done, let's look at a few additional ways you can handle the types for id and age.

While there is nothing wrong with using the traditional short int notation, be aware that stdint.h provides exact width types that allow you to specify an 8, 16, 32, 64 or 128 bit width for your integer/unsigned values. This eliminates any chance of architecture or compiler variation of type-size. The corresponding exact width printf/scanf format specifiers are provided in inttypes.h.

Without knowing what is in the binary animals.dat file, you are left to examine the contents to determine the record and individual variable sizes, and endianness on your own. (I saved it as animals.bin.dat to distinguish it from your first file). The Linux tools available to examine the bytes of the file include od and hexdump to name a few. Windows can provide a similar dump in powershell, or the old standby WinHex Hex Editor that can be downloaded for free works well [1]. There is no magic to it, you just dump the bytes in the file and start identifying what you can and start counting..., e.g. the first two records of the binary animals.dat are:

$ hexdump -Cv dat/animals.bin.dat
00000000  01 00 41 62 69 67 61 69  6c 00 00 00 04 00 00 00  |..Abigail.......|
00000010  74 01 00 00 74 01 43 61  70 72 69 63 6f 72 6e 69  |t...t.Capricorni|
00000020  73 20 73 75 6d 61 74 72  61 65 6e 73 69 73 00 00  |s sumatraensis..|
00000030  5c 3f 1b 00 5c 4f 1b 00  5c 53 08 00 02 00 75 6e  |\?..\O..\S....un|
00000040  6b 6e 6f 77 6e 00 00 00  04 00 00 00 74 01 00 00  |known.......t...|
00000050  74 01 4f 72 79 78 20 6c  65 75 63 6f 72 79 78 00  |t.Oryx leucoryx.|
00000060  6d 61 74 72 61 65 6e 73  69 73 00 00 5c 3f 1b 00  |matraensis..\?..|
00000070  5c 4f 1b 00 5c 4d 0c 00  03 00 41 64 72 69 61 6e  |\O..\M....Adrian|

Picking away at it, using the id, name, type, size, age variables as a guide, you can determine the integer and string widths. With that in hand you can use fread to read a record at a time 60-bytes and from that memcpy the appropriate bytes to the individual variables as required. This file is a good (bad example) of what happens when your write fixed length arrays containing strings to a file that have not been properly initialized. Garbage is left after the nul-terminator for the string and the beginning of the next data. That is most likely where the debris comes from that is well-discussed in the other answer. Suffice it to say debris makes your examination more challenging...

After completing your examination of bytes, you should be able to do something like the following to read 60-bytes at a time and then extract the id, name, type, size, age values from it:

#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>

/* consts for max name, type, record size, max animals to read */
enum { MAXN = 20, MAXT = 35, RECSZ = 60, MAXA = 128 };

typedef struct {
    uint16_t id,
            age;
    char name[MAXN],
        type[MAXT],
        size;
} animal;

int main (int argc, char **argv) {

    int n = 0;                              /* array index */
    animal animals[MAXA] = {{ .id = 0 }};   /* animals array */
    FILE *fp = argc > 1 ? fopen (argv[1], "rb") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    while (n < MAXA) {      /* read up to MAXA animal records */
        uint8_t rec[RECSZ] = "",            /* record buffer */
                size = 0,                   /* size of member */
                offset = 0;                 /* offset in record */

        if (fread (rec, 1, RECSZ, fp) != RECSZ) /* read/validate rec */
            break;

        size = sizeof animals[n].id;        /* get id size */
        memcpy (&animals[n].id, rec, size); /* copy from rec to id */
        offset += size;                     /* add size to rec offset */

        size = sizeof animals[n].name;      /* repeat for each member */
        memcpy (animals[n].name, rec + offset, size);
        offset += size;

        size = sizeof animals[n].type;
        memcpy (animals[n].type, rec + offset, size);
        offset += size;

        size = sizeof animals[n].size;
        memcpy (&animals[n].size, rec + offset, size);
        offset += size;

        size = sizeof animals[n].age;
        memcpy (&animals[n].age, rec + offset, size);

        n++;    /* increment array index after copy */
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    /* do what you need with data (printing here) */
    for (int i = 0; i < n; i++)
        printf ("%3" PRIu16 "    %-20s  %-35s  %c  %3" PRIu16 "\n",
                animals[i].id, animals[i].name, animals[i].type,
                animals[i].size, animals[i].age);

    return 0;
}

Note above the use of the uint16_t 16-bit unsigned type for id and age. Note also the corresponding PRIu16 format specifier used in printf. Also note that the format specifier is not included within quotes in the format string.

A side note above, when you are reading bytes, you can reverse the size and nmemb parameter to fread and validate a complete read against your record size as opposed to 1. The validation is the same, but if you are capturing the return, it will return the number of bytes read as opposed to the number of members. (e.g. you either read 60 1-byte members or 1 60-byte member, entirely up to you)

Putting the new code to use to read the binary file will will your array of animal as follows:

Example Use/Output

$ ./bin/animals_bin dat/animals.bin.dat
  1    Abigail               Capricornis sumatraensis             S    8
  2    unknown               Oryx leucoryx                        M   12
  3    Adrian                Cephalophus dorsalis                 L   16
  4    Ahmed                 Naemorhedus griseus                  L   10
  5    Aidan                 Naemorhedus caudatus                 X    9
  6    Allegra               Pseudois nayaur                      S    5
  7    Amela                 Cerdocyon thous                      M   11
  8    unknown               Capra falconeri                      M    1
  9    Anjolie               Ailurus fulgens                      L   10
 10    Athena                Moschus fuscus                       S    5
 11    Ava                   Cephalophus jentinki                 M   13
 12    Axel                  Hippocamelus antisensis              M   11
 13    Ayanna                Gazella cuvieri                      S   12
 14    Bradley               Bubalus mindorensis                  X    4
 15    Brendan               Bos gaurus                           X    1

Look things over and let me know if you have further questions.

Footnotes:

1.) As with any software, know the site you get it from, validate the checksum, and virus scan before ever thinking about loading it. If you are really paranoid, load it in a virtual machine and perform full diagnostics before bringing it into a production environment -- but that's probably overkill.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85