2

For given representation,

typedef struct {
  int age;
  char *firstName;
  char *lastName;
}Record;

and given file.txt,

Age,LastName,FirstName
50,B,A
30,A,B
20,X,D
10,F,A
90,V,E
60,N,M

Below is the code in main(),

 pFile=fopen("file.txt", "r");
 ...
 //Complete file is copied to 'readBuffer', approach inspired from  
 // http://stackoverflow.com/a/11487960/3317808
 ....

 char *record = strtok(readBuffer,"\n"); //Ignore header record
  record = strtok(NULL, "\n");// Read first data record(50,'B','A')

  for(;record != NULL; record = strtok(NULL,"\n")){

    printf("###Print complete record\n");
    puts(record);

    Record *r = malloc(sizeof(Record)*1);

    r->age = atoi(strtok(record,","));

    char *firstName = strtok(NULL,",");
    char *lastName = strtok(NULL, ",");

    r->firstName = strdup(firstName);
    r->lastName = strdup(lastName);
    printf("Age: %d\n", r->age);
    printf("First name: %s\n", r->firstName);
    printf("Last name: %s\n", r->lastName);

  }

strtok(readBuffer,",")confuses compiler with strtok(record,",") in for-loop


Actual output shows that tokenisation happened for only one record.

$ ./program.exe
Print complete file
Age,LastName,FirstName
50,B,A
30,A,B
20,X,D
10,F,A
90,V,E
60,N,M



###Print complete record
50,B,A
Age: 50
First name: B
Last name: A

How to resolve it?

overexchange
  • 15,768
  • 30
  • 152
  • 347
  • `record = strtok(NULL, "\n");// Read first data record`.. are you sure? – Sourav Ghosh Jan 02 '17 at 04:37
  • `//Complete file is copied to 'readBuffer'`...not a very good idea.. – Sourav Ghosh Jan 02 '17 at 04:38
  • @SouravGhosh Output shows first data record, so yes for your first comment – overexchange Jan 02 '17 at 04:50
  • To use `strtok` in a `for` loop, you need to read each line of input into a buffer. (e.g. `char buf[256];`) Example `while (fgets (buf, sizeof buf, pFile)) { char *p; for (p = strtok (buf, delims); p; p = strtok (NULL, delims)) {...}}` – David C. Rankin Jan 02 '17 at 04:52
  • @DavidC.Rankin How to decide size of `buf`? Do I know record size, before hand? Not sure about record size. [answer](http://stackoverflow.com/a/11487960/3317808) says, text editors follow this approach to copy complete file into buffer – overexchange Jan 02 '17 at 04:54
  • You either know the limit of your data, or you will need to size the buffer dynamically. In your case, *for your data*, you need a minimum of `8` chars (6 + `'\n` + `\0`). If you are handling your header line as well, you need the length of the header + 1. To be safe, with header it is `23` chars required, so I would size the buf at `32` to give a little wiggle room. If you have no idea, then you need an initial allocation, a check on the complete read, and a reallocation if the read is less than the entire line. (leave that for later) – David C. Rankin Jan 02 '17 at 04:58
  • Candidly, since your name will be more than `A`, `B`, etc.. You can generally cover yourself presuming `32` char per first/last name component. Then since the buffer is just temporary, and the cost of using a `256` char buffer verses an `80` char buffer is in the noise on all but the smallest embedded chips, I'd feel confident with a static buffer of `256` to cover the age, first, last name line length in all cases. (you still need to validate your read, but you should be safe). – David C. Rankin Jan 02 '17 at 05:06
  • @DavidC.Rankin If interviewer gives another file with different records to sort. This exercise am doing to mergesort the records. Using this approach, only last 8 lines in for-loop will change, which make sense – overexchange Jan 02 '17 at 05:10
  • @overexchange I hope my answer helps. I think mergesort is a very good sorting algorithm for this. – RoadRunner Jan 02 '17 at 07:57
  • @RoadRunner Am checking line by line. Need more time – overexchange Jan 02 '17 at 08:29

3 Answers3

4

If it is possible in our case, using strtok_r() seems to be the easiest way out here. Just to inform, this is not standard C, it's in POSIX.

From the man page,

The strtok_r() function is a reentrant version strtok(). The saveptr argument is a pointer to a char * variable that is used internally by strtok_r() in order to maintain context between successive calls that parse the same string.

and

Different strings may be parsed concurrently using sequences of calls to strtok_r() that specify different saveptr arguments.

The man page also has an example for the scenario you're looking for.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
2

As @David C. Rankin suggested, using fgets along with strtok to read each line is a good approach to this problem.

If you want to use mergesort in the future, then storing your data in an array of structs would be easiest to implement with this sorting algorithm. Furthermore, if you don't know how many lines will be in the file, then you might need to dynamically allocate this on run-time.

You can have a lower-level struct storing each line in the file:

typedef struct {
    int age;
    char *firstname;
    char *lastname;
} record_t;

And a higher-level struct storing all contents of the file:

typedef struct {
    record_t *records; /* pointer to record_t */
    char *headers;     /* pointer holding header */
    size_t currsize;   /* current status of information being added */
    size_t lastidx;
} allrecords_t;

Things to note about fgets:

  • Adds \n character at the end of buffer, before the null-terminator \0. This appended \n can be removed easily though.
  • On error, returns NULL. If EOF is reached and no characters have been read, then this also returns NULL.
  • Buffer size must be statically declared.
  • Needs to be read from specified stream, either from stdin or from FILE *.

Optional usage of fgets in a program:

When using fgets(), you can call it once to consume the header information:

fgets(buffer, 256, pfile); /* error checking needed */

Then, you can call it again in a while() loop, to consume the rest of the data in the file:

while (fgets(buffer, 256, pfile) != NULL) {
    ....
}

Implementation of all these ideas in a Program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* Constants used */
#define INITSIZE 20
#define BUFFSIZE 256

#define MALLOC_MSG "Allocation"
#define REALLOC_MSG "Reallocation"

/* array of structs setup */
typedef struct {
    int age;
    char *firstname;
    char *lastname;
} record_t;

typedef struct {
    record_t *records;
    char *headers;
    size_t currsize;
    size_t lastidx;
} allrecords_t;

/* function prototypes */
allrecords_t *initialize_records(void);
void read_header(FILE *filestream, allrecords_t *Record, char buffer[]);
void read_data(FILE *filestream, allrecords_t *Record, char buffer[]);
void print_records(allrecords_t *Record);
void check_ptr(void *ptr, const char *msg);
void remove_newline(char buffer[]);

int main(void) {
    FILE *fp;
    allrecords_t *Record;

    /* static buffer for fgets() */
    char buffer[BUFFSIZE];

    fp = fopen("fileex.txt", "r");
    if (!fp) {
        fprintf(stderr, "Cannot read file.\n");
        exit(EXIT_FAILURE);
    }

    Record = initialize_records();

    /* Reads the first line */
    read_header(fp, Record, buffer);

    /* Reads next lines */
    read_data(fp, Record, buffer);

    /* prints and frees structure elements*/
    print_records(Record);

    return 0;
}

/* function which reads the age/firstname/lastname data */
void read_data(FILE *filestream, allrecords_t *Record, char buffer[]) {
    char *data; /* only need one char *pointer for strtok() */
    const char *delim = ",";

    while (fgets(buffer, BUFFSIZE, filestream) != NULL) {
        remove_newline(buffer); /* optional to remove '\n' */

        /* resize array when necessary */
        if (Record->currsize == Record->lastidx) {
            Record->currsize *= 2;
            Record->records = realloc(Record->records, Record->currsize * sizeof(record_t));
            check_ptr(Record->records, REALLOC_MSG);
        }

        /* adding info to array */
        /* using strdup() will lead to less code here */
        data = strtok(buffer, delim);
        Record->records[Record->lastidx].age = atoi(data);

        data = strtok(NULL, delim);
        Record->records[Record->lastidx].firstname = malloc(strlen(data)+1);
        check_ptr(Record->records[Record->lastidx].firstname, MALLOC_MSG);
        strcpy(Record->records[Record->lastidx].firstname, data);

        data = strtok(NULL, delim);
        Record->records[Record->lastidx].lastname = malloc(strlen(data)+1);
        check_ptr(Record->records[Record->lastidx].lastname, MALLOC_MSG);
        strcpy(Record->records[Record->lastidx].lastname, data);

        Record->lastidx++;
    }

}

/* prints and frees all members safely, without UB */
void print_records(allrecords_t *Record) {
    size_t i;

    printf("\nComplete Record:\n");

    printf("%s\n", Record->headers);
    free(Record->headers);
    Record->headers = NULL;

    for (i = 0; i < Record->lastidx; i++) {
        printf("%d,%s,%s\n", Record->records[i].age, 
                             Record->records[i].firstname, 
                             Record->records[i].lastname);

        free(Record->records[i].firstname);
        Record->records[i].firstname = NULL;

        free(Record->records[i].lastname);
        Record->records[i].lastname = NULL;
    }

    free(Record->records);
    Record->records = NULL;

    free(Record);
    Record = NULL;
}

/* function which only reads header */
void read_header(FILE *filestream, allrecords_t *Record, char buffer[]) {
    if (fgets(buffer, BUFFSIZE, filestream) == NULL) {
        fprintf(stderr, "Error reading header.\n");
        exit(EXIT_FAILURE);
    }

    remove_newline(buffer);

    Record->headers = malloc(strlen(buffer)+1);
    check_ptr(Record->headers, MALLOC_MSG);
    strcpy(Record->headers, buffer);
}

/* function which removes '\n', lots of methods to do this */
void remove_newline(char buffer[]) {
    size_t slen;

    slen = strlen(buffer);

    /* safe way to remove '\n' and check for bufferoverflow */
    if (slen > 0) {
        if (buffer[slen-1] == '\n') {
            buffer[slen-1] = '\0';
        } else {
            printf("Buffer overflow detected.\n");
            exit(EXIT_FAILURE);
        }
    }
}

/* initializes higher level struct */
allrecords_t *initialize_records(void) {
    allrecords_t *Record = malloc(sizeof(*Record));
    check_ptr(Record, MALLOC_MSG);

    Record->currsize = INITSIZE;

    Record->headers = NULL;

    Record->records = malloc(Record->currsize * sizeof(record_t));
    check_ptr(Record->records, MALLOC_MSG);

    Record->lastidx = 0;

    return Record;
}

/* instead of checking for 'ptr == NULL' everywhere, just call this function */
void check_ptr(void *ptr, const char *msg) {
    if (!ptr) {
        printf("Null pointer returned: %s\n", msg);
        exit(EXIT_FAILURE);
    }
}

Note: I used malloc() + strcpy() instead of strdup(), because they come from standard C libraries like <string.h> and <stdlib.h>, instead of POSIX C.

Program output:

Complete Record:
Age,LastName,FirstName
50,B,A
30,A,B
20,X,D
10,F,A
90,V,E
60,N,M
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
  • Do you prefer reading 256 bytes to get header info. Is that a good programming practice? If I can use the file api to get the length of header before hand, What would you say? Using `fseek()` & `ftell()` – overexchange Jan 02 '17 at 08:55
  • Using 256 bytes is not bad practice, because it is a sensible limit to how long the header information would be per line. Using `fseek()` and `ftell()` would be good for getting the amount of btyes in the entire file, but I'm not sure if that would help in your case. This does end up being more complicated to use, but it is possible.I think in your case it is better to try and find the limit of characters per line for the buffer. One alternative would be to count the characters on each line with `fgetc`, then you make an exact buffer size that way, instead of approximating with `256`. – RoadRunner Jan 02 '17 at 09:09
  • [answer](http://stackoverflow.com/a/631522/3317808) talks about, when to apply `strlen()`? – overexchange Jan 02 '17 at 09:23
  • Using `strlen()` will not lead to undefined behavior is the string is null-terminated with `\0` at the end. Is there something you don't understand about this? – RoadRunner Jan 02 '17 at 09:27
  • My bad, `fgets()` takes care of placing `\0` after `\n` – overexchange Jan 02 '17 at 09:35
  • Yeah no problem. I verified that in my answer :) – RoadRunner Jan 02 '17 at 09:37
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/132077/discussion-between-roadrunner-and-overexchange). – RoadRunner Jan 02 '17 at 09:44
0

The problem is more to do with the logic rather than the use of strtok.

Also worth noting is the format of your record being read, you have an issue with not having a comma after the last name field.

The code below will achieve what your question asks

{
char str[80] = "Age,LastName,FirstName\n50,B,A\n30,A,B\n20,X,D\n";

const char newline[2] = "\n";
const char comma[2] = ",";

/* get over the header fields */
strtok(str, newline);

/* walk through other tokens */
for(;;)
{
    Record *r = (Record*)malloc(sizeof(Record)*1);

    char *age = strtok(NULL, comma);
    if(age != NULL)
    {
        r->age = atoi(age);

        char *firstName = strtok(NULL, comma);
        char *lastName = strtok(NULL, newline);

        r->firstName = (char *)strdup(firstName);
        r->lastName = (char *)strdup(lastName);

        printf("Age: %d\n", r->age);
        printf("First name: %s\n", r->firstName);
        printf("Last name: %s\n", r->lastName);
    }
    else
        break;
}

return(0);
}
derek
  • 476
  • 5
  • 11
  • The question was how to display ALL RECORDS, the code above does this with minimal changes, I think both of you misread the original question and have over elaborated your answers. – derek Jan 02 '17 at 13:43
  • @overexchange I did read the conversation between you and David, you gave your opinions and I gave my implementation, my answer solves your problem and and displays all records as you asked for, it compiles and works:-) It's disappointing that this answer was down voted after I had taken the time to produce a working solution for you. – derek Jan 02 '17 at 13:54
  • My bad, I make a mistake. I don't even understand why I don't comment my vote, I always do. You should not precise `80`, `2` and `2` in the array declaration. And the `malloc()` is useless and produce to memory leak. You don't check the return value of `strtok()` in the `if`. You should not `cast` the return of `malloc()` and `strdup()`. You don't check the return of `strdup()`. – Stargateur Jan 04 '17 at 12:48
  • @Stargateur that makes more sense now:-) my intention was a no frills working example, I wasn't trying to show best practice of C and C++, merely showing the basic logic of where the original question went wrong, having said that I do take on board your points and will be more careful when posting code on this site as the standard seems to be very high;-) . – derek Jan 04 '17 at 20:49
  • Don't take downvote too seriously. Some people like answer that just solve the problem. Some people like me don't like answer with undefined behavior. Because I think that answer should teach good practice. – Stargateur Jan 04 '17 at 22:55