1

Currently I'm doing a practical work to collage in witch I have to read data from a file.

The file data structure is: "id name sex"

example:

nm0025630   Vikas Anand M
nm0418131   Victor Janson   M
nm0411451   Dick Israel M
nm0757820   Leopoldo Salcedo    M

To read the currently I'm using this code:

    fh = NULL;
    fh = fopen(ACTORS, "r");
    if (!fh) {
        exit(1);
    }
    while (!feof(fh)) {
        char sex, name[100], id[10];

        fgets(id, 10, fh);
        fgets(name, 100, fh);
        fgetc(sex);

        if (!feof(fh)) {
            hash_update_node(hash, get_id_num(id), name, sex);
            count++;
        }
    }

The problem is that it reads the name and the sex together. Any help is appreciated.

Moha the almighty camel
  • 4,327
  • 4
  • 30
  • 53
Mabs 2001
  • 13
  • 3
  • 1
    That's because you give it space for 100 bytes,so it stops when it finds end of line. Since names can be of variable sizes, and if your project allows it, use a separator between your data items in each line such as a tab or comma or another character, read a whole line at a time into a sufficiently large buffer, then use `strtok` to break out each field of the line. – DNT May 30 '20 at 19:50
  • 1
    you dont have to use fgets(). you can use fscanf() and use the conversion specifiers for limiting and determine which data are you reading – Adam May 30 '20 at 19:51
  • I cant use fscanf() because the number of words is not always the same. There is some with one word and other with 4. – Mabs 2001 May 30 '20 at 19:53
  • 3
    `fgetc(sex)` is not correct. It should be: `sex = fgetc(fh)`. You should heed the warnings your compiler is giving you. Also note that, [`while (!feof(fh))` is not good](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong). – lurker May 30 '20 at 19:55
  • @Mabs2001 so use `fgets` to read 100 then the sex will be the last character in it (no separated read for it), and remove from name the last character and spaces before – bruno May 30 '20 at 19:56
  • if you cant use fscanf then `fgets()` shall be used and as @DNT saied you should process or parse the the `buffer` returned by `fgets()` and do the work! cos your file contenents is not uniform – Adam May 30 '20 at 19:57
  • also do not do `while (!feof(fh))`, detect the EOF when a `fgets` reach EOF – bruno May 30 '20 at 20:00

2 Answers2

1

fgets(name, 100, fh); reads up to 99 character, when the name has less than 98 character the sex is also read if it has onlt one space before.

Because the name is may be composed of several words separated by spaces one way is to read all the line then extract the sex.

Warning the first time you do while (!feof(fh)) { this is without any read before so feof cannot know if the file is empty or not then if you reach EOF or not. I encourage you to detect the EOF looking at read result.

Also because you only save the read data when if (!feof(fh)){ you do not memorize the information from the last line.

Note also fgets saves the newline if there is enough place for it, it is more practical to use fscanf.

So one way can be :

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

#define ACTORS "/tmp/actors"

int main()
{
  FILE * fh = fopen(ACTORS, "r");

  if (!fh) {
    perror("cannot read " ACTORS);
    exit(1);
  }

  char name[100],id[10];

  while (fscanf(fh, "%9s %99[^\n]", id, name) == 2) {
    size_t sz = strlen(name);
    char sex = name[--sz];

    for (;;) {
      if (sz == 0) {
        puts("empty name");
        exit(2);
      }
      if (!isspace((unsigned char) name[--sz]))
        break;
    }

    name[sz+1] = 0;

    /*
    hash_update_node(hash, get_id_num(id) , name, sex);
    count++;
    */
    printf("id='%s', name='%s', sex=%c\n", id, name, sex);
  }

  fclose(fh);
  return 0;
}

Compilation and executions :

pi@raspberrypi:/tmp $ gcc -Wall r.c
pi@raspberrypi:/tmp $ ./a.out
cannot read /tmp/actors: No such file or directory
pi@raspberrypi:/tmp $ cat > actors
nm0025630 Vikas Anand M
nm0418131 Victor Janson M
nm0411451 Dick Israel M
nm0757820 Leopoldo Salcedo M
pi@raspberrypi:/tmp $ ./a.out
id='nm0025630', name='Vikas Anand', sex=M
id='nm0418131', name='Victor Janson', sex=M
id='nm0411451', name='Dick Israel', sex=M
id='nm0757820', name='Leopoldo Salcedo', sex=M
pi@raspberrypi:/tmp $ 
bruno
  • 32,421
  • 7
  • 25
  • 37
  • Hi bruno, it seems the fields are separated by TAB characters in the file. – chqrlie May 30 '20 at 20:24
  • Hi @chqrlie if we are sure of that it is possible to use that way in a `fscanf` format to not have to do the stuff for the sex. Anyway even with tab rather than `' '` my answer works because I use `isspace` rather than to compare with `' '`. – bruno May 30 '20 at 20:27
  • @chqrlie no there is no problem with the decrements when checking if the character is a space. Note the string cannot be empty and contains at least the sex because `fscanf` returned 2 – bruno May 30 '20 at 20:36
  • @chqrlie ah yes, I understand now, I edited my answer, thank you – bruno May 31 '20 at 12:01
  • Sorry to be nitpickiing, but the `(unsigned char)` cast is still missing. Passing naked `char` values to `isspace()` may have undefined behavior for negative characters on platforms where `char` is signed by default. – chqrlie May 31 '20 at 12:15
  • @chqrlie I know, trust me each time I use one of the function *isx* I think about you saying that :-)) But That sounds so weird, I mean are you sure the implementation of *isx* do not manages the negative value compatible with `CHAR_MIN / CHAR_MAX` when by default the compiler uses `signed char` ? The lib and compiler are associated. Anyway, for you, I edited my answer ;-) – bruno May 31 '20 at 12:53
  • 1
    some C libraries do attempt to handle both signed char and unsigned char values using arrays of 384 flag words, hence allowing for arguments between -128 and 255 inclusive, but the C Standard does not make this mandatory and there are limits to what can be done: `char` value `-1` cannot be distinguished from `EOF` (usually defined as `(-1)`). So there is no way to have both `isalpha('ÿ') == 1` and `isalpha(EOF) == 0` for the French locale with ISO8859-1 encoding and `char` signed by default. – chqrlie May 31 '20 at 20:08
  • 1
    The fundamental inconsistency at the root of these problems is `char` should never be signed by default. Many C library functions only handle characters as unsigned: all functions in `` only handle values of `unsigned char` and the special negative value `EOF`, `getchar()` returns byte values as `unsigned char` values (despite the name!), `strcmp()` compares strings as sequences of `unsigned char`, (just like `memcpy`), `ungetc()` fails on `-1` but any other `char` or `int` values are just converted to `unsigned char`, how confusing that `'\xFF'` may have a value of `-1`! – chqrlie May 31 '20 at 20:23
1

It seems the fields are separated by TAB characters in the file. If this is correct, you can parse the file with fscanf():

#include <stdio.h>
#include <stdlib.h>

int local_file(void) {
    char sex, name[100], id[10];
    int count = 0;

    FILE *fh = fopen(ACTORS, "r");
    if (!fh) {
        exit(1);
    }
    while (fscanf("%9[^\t]%*1[\t]%99[^\t]%*1[\t]%c", id, name, &sex) == 3) {
        hash_update_node(hash, get_id_num(id), name, sex);
        count++;
    }
    return count;
}

Note however that this code will fail if any of the fields are empty.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • so just add `&& *name && *id` in the test (UV anyway) – bruno May 31 '20 at 12:03
  • @bruno: if `fscanf()` returns `3`, neither `name` nor `id` can be empty strings. I meant `fscanf()` cannot parse empty fields with `%[^\t]` because there must be at least one character different from TAB and `'\0'` for the conversion to succeed. – chqrlie May 31 '20 at 12:18