0

What I want to try to do is to parse a CSV containing the first name, last name, and weight of a person and this data in an array of structures. Here is my code so far. I am able to parse and print out of the values of the CSV, but I am not too sure on how to store these values into the array of structure. I want to be able to access the values of the csv from person.c, but actually read this from vector.c, meaning that I have to call the function readCSV() from person.c in vector.c. Could any of you guys help me out?

//vector.h
#ifndef VECTOR_H_
#define VECTOR_H_
#include "person.h"

typedef struct
{
    Person *personArray;
    int sizeArray;
    int count;
}Vector;

//person.h
#ifndef PERSON_H_
#define PERSON_H_

typedef struct
{
    const char *firstName, *lastName;
    double weight;
}Person;

//vector.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include "vector.h"

void readPerson(Vector *v)
    {
        readCSV(v->personArray);
    }


//person.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include "person.h"

void initialize(Vector *v)
{
    v->sizeArray = 10;
    v->count = 0;
    v->personArray = (City*) malloc(v->sizeArray*sizeof(City));

}

void readCSV(Person* person)
    {
        FILE * fp;
        char line[1024];
        int i = 0;
        fp = fopen("mycsvfile.csv","r");
        while(fgets(line,sizeof(line),fp))
        {
            person->firstName = strtok(line,",");
            person->lastName = strtok(NULL,",");
            person->weight = atof(strtok(NULL,","));
            printf("%s %s %f\n",person->firstName, person->lastName, person->weight);
    }
    fclose(fp);
}

//main.c
#include<stdio.h>
#include<stdlib.h>
#include "vector.h"

int main()
    {
        Vector people;
        initialize(&people);
        readPerson(&people);
        return 0;
    }
  • Allocate memory for `personArray` using `malloc`/`calloc` – Spikatrix Apr 07 '15 at 08:28
  • You might also want to read about linked lists if the size of the arrays is not known in advance. See http://stackoverflow.com/questions/166884/array-versus-linked-list – Dobromir Velev Apr 07 '15 at 08:31
  • How is that applicable here? Linked lists are fine, but for the purpose of this question, they are unrelated. As SO is a learning site, you are encouraged to try best to determine the issue preventing the questioner from moving forward. When you suggest something like a linked list here, it adds an additional layer of complexity not necessary to resolving the problem. I'm not picking on you, I'm just tying to help you help us make SO better. – David C. Rankin Apr 07 '15 at 08:39
  • 1
    OK, I agree - but using vector functions in plain C is almost the same . So on the question - first where does (City*) came from and is it the same (Person *). Second assigning person->firstName = strtok... will not work - this is just assigning the current pointer from the line char which will be changed on the next loop iteration. You will need to strcpy/memcpy the data to the array. – Dobromir Velev Apr 07 '15 at 08:50
  • 1
    The output of strtok is not kept between each iteration, check up the documentation of strtok. strtok contains a static pointer to the buffer and changes the contents of the buffer. once the loop iteration is over it is all lost that is why you need to copy the output from strtok. [see](http://stackoverflow.com/questions/3889992/how-does-strtok-split-the-string-into-tokens-in-c/3890186#3890186) – AndersK Apr 07 '15 at 10:49

2 Answers2

0

like as follows:

int readCSV(Person* person, FILE *fp){
    int state;
    char line[1024];
    if(state = fgets(line,sizeof(line), fp)){
        person->firstName = strdup(strtok(line,","));
        person->lastName  = strdup(strtok(NULL,","));
        person->weight    = atof(strtok(NULL,"\n"));
        printf("%s %s %f\n", person->firstName, person->lastName, person->weight);
    }
    return !!state;
}

void readPerson(Vector *v){
    //if(!v && !v->personArray)
    FILE *fp;
    int count = 0;
    if(NULL != (fp = fopen("mycsvfile.csv","r"))){
        while(readCSV(&v->personArray[count], fp)){
            if(++count == v->sizeArray){
                //expand array or stop reading
            }
        }
        v->count = count;
        fclose(fp);
    }
}
BLUEPIXY
  • 39,699
  • 7
  • 33
  • 70
0

Your code has 2 main issues :

  1. In readCSV you have a loop to read all the lines in the CSV but all data you are reading is stored always in the same (non-array) variable. You need to pass a Vector, rather than a Person to readCSV and store the data using people->personArray[index].some_field .
  2. Each time fgets() is called it overwrites line[1024]. You need to make a copy of the parsed data before calling fgets() again. I've used strdup for that .

Printing worked in your code because you printed the parsed data before calling fgets(). But issue 2 stops you from being able to separate parsing and printing.

This code fixes those two issues and moves printing to a different function doing minimal editing. I've renamed person.c to vector.c since the functions there deal with the vector as a whole rather than a single person.

//person.h
#ifndef PERSON_H_
#define PERSON_H_

typedef struct
{
    const char *firstName, *lastName;
    double weight;
}Person;

#endif

//vector.h
#ifndef VECTOR_H_
#define VECTOR_H_
#include "person.h"

typedef struct
{
    Person *personArray;
    int sizeArray;
    int count;
}Vector;

extern void readCSV(Vector* people);
extern void printCSV(Vector* people);

#endif

//vector.c
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include "vector.h"

void initialize(Vector *v)
{
    v->sizeArray = 10;
    v->count = 0;
    v->personArray = (Person*) malloc(v->sizeArray*sizeof(Person));

}

void readCSV(Vector* people)
{
    FILE * fp;
    char line[1024];
    int index = 0;
    fp = fopen("mycsvfile.csv","r");
    while(fgets(line,sizeof(line),fp))
    {
        people->personArray[index].firstName = strdup(strtok(line,","));
        people->personArray[index].lastName = strdup(strtok(NULL,","));
        people->personArray[index].weight = atof(strtok(NULL,","));
        ++index;
    }
    people->count = index;
    fclose(fp);
}

void printCSV(Vector* people)
{
    int i;
    for( i=0; i<people->count; ++i )
    {
            printf("%s %s %f\n",
                    people->personArray[i].firstName,
                    people->personArray[i].lastName,
                    people->personArray[i].weight);
    }
}

//main.c
#include<stdio.h>
#include<stdlib.h>
#include "vector.h"

int main()
{
        Vector people;
        initialize(&people);
        readCSV(&people);
        printCSV(&people);
        return 0;
}

That code works and should get you to continue with your work or assignment. But in order to become quality code worthy of production status many issues remain to be solved :

  1. Freeing memory used. Memory allocaded with malloc, strdup, etc needs to be freed when no longer used. In this minor program it will be freed by the C runtime when main exits but when your code becomes part of a bigger project it will be an important issue. Better to get used to it now.
  2. Dealing with unconstrained input. Try to run the program using an input file with 11 entries, it may crash or exhibit undefined behavior. This is because we allocate 10 entries in initialize(Vector* v). To solve it you can use a growable data type, like linked lists. Or you can parse the file twice; once discarding data to determine how many entries it has, then allocate the array and then a second one to actually read and store the data.
  3. Dealing with incorrect input. You need to specify the format of your CSV file. Are empty fields allowed? Maximum length of line? And verify the correctness of the input file, delivering an error if it does not parse rather than reading incorrect input. What happens if the file does not exist or can't be read?
  4. Make your code generic. readCSV(Vector* people) should become readCSV(Vector*, const char* filename). And you can read the filename from the command line using main(int argc, char* argv[]) rather than using a constant string.
  5. Beware of strtok() and fgets() behavior! Lines read with fgets() (except maybe the last one) will end in '\n'. In a line ending with "83.5\n" the last token will be "83.5\n". Notice the carriage return there. That may not be what you want. In your code it works because atof stops parsing at '\n'. If you don't want a \n in your token you can use strtok( line_or_NULL, ",\n")
  6. Deal properly with your Vector status. What happens if you call readCSV() twice? What if initialize() is called twice()? Safest practice would be to detect it, since it is a fast check, and abort the program with an error code if incorrect calls are made, or return an error code.
  7. Document code. A header to each function explaining what it does, parameters and what it returns It is very important to list preconditions and postconditions, like allowing or not to call initialize twice on the same Vector.
  8. Unit tests.
Anonymous Coward
  • 3,140
  • 22
  • 39