-1

I'm trying to read a CSV file in C and store that data into a vector.

My CSV file entries for each line looks like: 12/12/1914, 52.4, however, I am only interested in retrieving the number from this CSV, not the dates. To accomplish this, I've been trying to read the file line by line using fgets() , and then separating the number value out through the use of strtok().

When I print out the results of strtok() I get the numbers I am looking for, but I also get (null) printed with them:

(null)
25798.42

(null)
25706.68

(null)
25379.45

(null)
25444.34

(null)
25317.41

Also, when I try and print the actual vector entires, they just print out garbage (I assume this is because (null) is attached to them but not positive):

3907216808; 0; 
3907216808; 0; 

My function for reading the data looks like this:

void get_CSV_data(vc_vector* prices)
{
    FILE *fp = fopen(_FILE_PATH, "r");
    char singleLine[20];

    while(!feof(fp)){
        fgets(singleLine, 20, fp);

        char* token = strtok(singleLine, ",");
        while (token != NULL) {
            token = strtok(NULL, ",");
            printf("%s\n", token);
            vc_vector_push_back(prices, &token);
        }
    }
    // Print each vector element
    for (void* i = vc_vector_begin(prices);
         i != vc_vector_end(prices);
         i = vc_vector_next(prices, i)) {
         printf("%u; ", *(int*)i);
    }
}

I assume I am using strtok() incorrectly, can anyone advise? Also, while I am here, quick side question, is free(token); needed at some point? Or no because malloc() was never called? Still pretty new to C.

EDIT: My function now looks like:

    void get_CSV_data(vc_vector* prices)
{
    FILE *fp = fopen(_FILE_PATH, "r");
    char singleLine[20];

    while(fgets(singleLine, 20, fp) != NULL){
        char* token = strtok(singleLine, ",");
        token = strtok(NULL, ",");
        //printf("%s\n", token);
        vc_vector_push_back(prices, strdup(token));


    }
    // Print each vector element
    for (void* i = vc_vector_begin(prices);
         i != vc_vector_end(prices);
         i = vc_vector_next(prices, i)) {
         printf("%s\n ", (char*)i);
    }
}

I get results like:

25598.7425052.8325339.9925250.5525798.4225706.6825379.4525444.3425317.4125191.43    25052.8325339.9925250.5525798.4225706.6825379.4525444.3425317.4125191.43
25339.9925250.5525798.4225706.6825379.4525444.3425317.4125191.43
25250.5525798.4225706.6825379.4525444.3425317.4125191.43
25798.4225706.6825379.4525444.3425317.4125191.43
25706.6825379.4525444.3425317.4125191.43
25379.4525444.3425317.4125191.43

Which are correct.

user3357738
  • 67
  • 1
  • 2
  • 8

2 Answers2

2

In

   char* token = strtok(singleLine, ",");
   while (token != NULL) {
       token = strtok(NULL, ",");
       printf("%s\n", token);
       vc_vector_push_back(prices, &token);
   }

vc_vector_push_back allows to save data having a given size, not a variable size, so you can use it only if you created the vector indicating the number of characters you will put in

In your case you do vc_vector_push_back(prices, &token); so you finally will save at least the address of the string memorized in token, this is wrong, you need to save the characters inside the string :

    char* token = strtok(singleLine, ",");
    while (token != NULL) {
        token = strtok(NULL, ",");
        printf("%s\n", token);
        vc_vector_push_back(prices, token);
    }

it is useless to duplicate token (as I imagined first) because vc_vector_push_back will do the copy depending on the size you indicated when you created the vector

Note you also loose the first token and you will finally push NULL, probably you want

    char* token = strtok(singleLine, ",");
    while (token != NULL) {
        printf("%s\n", token);
        vc_vector_push_back(prices, token);
        token = strtok(NULL, ",");
    }

In

 for (void* i = vc_vector_begin(prices);
      i != vc_vector_end(prices);
      i = vc_vector_next(prices, i)) {
      printf("%u; ", *(int*)i);
 }

you suppose prices contains int but this is false, it contains char*, must be

  for (void* i = vc_vector_begin(prices);
       i != vc_vector_end(prices);
       i = vc_vector_next(prices, i)) {
       printf("%s ", *(char**)i);
  }

You also need to change

while(!feof(fp)){
    fgets(singleLine, 20, fp);

by something like

while (fgets(singleLine, 20, fp) != NULL) {

I also encourage you to check the value of fopen(...) before to use it

bruno
  • 32,421
  • 7
  • 25
  • 37
  • strdup(token) seems to still be filling my vector with garbage values. – user3357738 Mar 06 '19 at 23:00
  • @user3357738 look my last edit, where `token = strtok(NULL, ",");` is moved *after* the push. Of course you also need to change the way you use the values from the vector because there is one level less (a `char*` rather than a `char**` like before) – bruno Mar 06 '19 at 23:02
  • your edit still prints both the date and the number at that date, however: `char* token = strtok(singleLine, ","); token = strtok(NULL, ","); printf("%s\n", token); vc_vector_push_back(prices, strdup(token));` does print just the numbers, still pushes garbage to my vector though. – user3357738 Mar 06 '19 at 23:06
  • @user3357738 you print the addresses of the `char*`, _prices_ **does not contains _int_**, see my answer (again edited) – bruno Mar 06 '19 at 23:08
  • hmm... `printf("%s\n ", (char*)i);` prints SOME of the values, but attaches a lot of garbage to them and other values are just still pure garbage, am I still printing wrong? I am going to edit above what the function now looks like. – user3357738 Mar 06 '19 at 23:19
  • @user3357738 wait, I am modifying my answer, that of vector is strange, please wait a little ... – bruno Mar 06 '19 at 23:20
  • I think it is outputting the right values on second look, just with lots of extra decimals (which is fine, I thankfully at least know enough C to handle that down the line), thanks for the help! – user3357738 Mar 06 '19 at 23:24
0

When I print out the results of strtok() I get the numbers I am looking for, but I also get (null) printed with them:

Yes, because you loop until you do. Consider:

        while (token != NULL) {
            token = strtok(NULL, ",");
            printf("%s\n", token);
            vc_vector_push_back(prices, &token);
        }

As long as the initial token is not NULL, on every iteration you read and then print the next token. Only then, after already having printed it, do you loop back to test whether it is null.

Since you seem to want exactly the second token of every line, it's pointless to loop. Just call strtok() twice:

        char* token = strtok(singleLine, ",\n");

        if (token) {
            token = strtok(NULL, ",\n");
            if (token) {
                printf("%s\n", token);
                vc_vector_push_back(prices, &token);  // but see below
            } // else handle malformed data
        } // else handle malformed data

Also, while I am here, quick side question, is free(token); needed at some point? Or no because malloc() was never called?

No, because, as you say, no memory was allocated. But think carefully about the implications. No memory is allocated because token points into the local array singleLine that you are tokenizing. That means:

  1. When you read the next line into the same buffer, you replace the pointed-to data.
  2. When the function returns, the lifetime of that array ends, rendering any pointers (in)to it invalid.

It appears that vc_vector copies elements in, but in your case it can only be copying the pointers themselves, not the pointed-to values, so that doesn't help at all with either of the above. Instead, to avoid corrupting your data and ultimately having a vector full of dangling pointers, you must make dynamically-allocated copies of the token strings, and store pointers to those in your vector.

If you have it, then the non-standard, but common, strdup() function can make such copies for you. Otherwise a combination of strlen(), malloc(), and strcpy() will do the same job. Note that even though there is no explicit call to an allocation function when you use strdup(), on success, the resulting duplicate string is indeed dynamically allocated, needing to be freed when you no longer want it.

Also, when I try and print the actual vector entires, they just print out garbage

Well that's because you're storing pointers to character arrays in your vector, but then trying to interpret them as if they were pointers to int. The pointer formats are probably compatible, but the data they point to are totally not. And type int isn't even an appropriate type, since your data are not integral (unless you can and do convert to a fixed-point representation). Perhaps, instead of duplicating the strings, you want to use, and allow the vector to copy, doubles:

double d = strtod(token, NULL);  // note: as written, performs no error checking
vc_vector_push_back(prices, &d);

That might require a change to the way you initialize the vector. You would then print them as doubles, say:

for (double *dp = vc_vector_begin(prices);
        dp != vc_vector_end(prices);
        dp = vc_vector_next(prices, dp)) {
     printf("%.2f; ", *dp);
}
John Bollinger
  • 160,171
  • 8
  • 81
  • 157