1

I have a file data.csv which contains float type data:


0.22,0.33,0.44

0.222,0.333,0.444


I need to read this file into a two dimensional dynamic array. But I am not able to read the full line with fgets. Not sure why?

Here is my C code which I used on Ubuntu:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    FILE *fp;
    float **data;    
    int i,j,rows=2,cols=3;   
    char * token;
    fp=fopen("data.csv","r");
    if(fp==NULL) {
            fprintf(stderr,"Can't open input file");
            exit(1);
    }

    data= malloc(rows * sizeof(float*)); 
    char *rowbuffer=malloc( cols * ( sizeof(float)+sizeof(char) ) );
    i=0;
    while(fgets(rowbuffer,sizeof(rowbuffer),fp) !=NULL) {      
        data[i] = malloc(cols * sizeof(float));      
        j=0;
        printf("\n %s",rowbuffer);
        for (token = strtok(rowbuffer,","); token != NULL; token = strtok(NULL, ",")) {
             data[i][j++] = atof(token);
             /*printf("%s",token);*/
        }
        i++;  
    }
    free(rowbuffer);
    for(i = 0; i < rows; i++)
        free(data[i]);
    free(data);
    fclose(fp);
}

The output is like:

0.22,0.

33,0.44

0.222,0

��

444

Error in `./test': double free or corruption (out): 0x0000000000adf270

Aborted (core dumped)

Can anyone tell why is this error? :( Or is there a better way to read this kind of data file?

Community
  • 1
  • 1
Kaur
  • 279
  • 1
  • 6
  • 18
  • 2
    `sizeof(rowbuffer) == sizeof( char * )`... that's likely 4, or 8, depending on hardware. Since you are *assuming* it's the size of the allocated buffer, your assumptions are wrong. – DevSolar Feb 12 '15 at 08:48
  • 1
    possible duplicate of [How to find the 'sizeof'(a pointer pointing to an array)?](http://stackoverflow.com/questions/492384/how-to-find-the-sizeofa-pointer-pointing-to-an-array) – Klas Lindbäck Feb 12 '15 at 08:48
  • 1
    Also, indentation. Whitespaces are free. ;-) – DevSolar Feb 12 '15 at 08:53

2 Answers2

4

One issue is here:

char *rowbuffer=malloc( cols * ( sizeof(float)+sizeof(char) ) );

sizeof(float) is the size that a float uses in memory, not in its text representation. When reading from files, you should allocate a buffer to contain a whole line in text format. In your case a good bet could be the following:

int bufsize = cols * (3 + DBL_MANT_DIG - DBL_MIN_EXP + 1) + 1;

(See this for why that value and what you need to #include: What is the maximum length in chars needed to represent any double value?. The trailing + 1 is to account for the newline character, which fgets() does read and include in the buffer.)

But that assumes that there are no formatting errors in the input file, so you might want to add some extra slack to that value.

Once you have that value, use it in both the malloc() and fgets():

char *rowbuffer=malloc(bufsize);
i=0;
while(fgets(rowbuffer,bufsize,fp) !=NULL) {
...

On a side note, your input file looks like it could be better read using scanf().

Community
  • 1
  • 1
SukkoPera
  • 621
  • 4
  • 8
  • thanks for your comments. I think memory reserved this way using `bufsize` will be much more than actually being used. Two things about my csv files is that I have no prior info about how many rows and columns it has - can be n thousands or more as well. Secondly, the precision being used in individual values can vary, e.g 0.124 or can be 0.001204. – Kaur Feb 14 '15 at 05:43
  • @Kaur: Well, it very much depends on how the data are organized in the file you want to read. If lines are a few tens of characters wide (let's say 80-100), which is what I assumed in your case, using a line buffer is usually affordable, and you don't even need it after the reading is complete. If lines can be (much) longer, and/or if you don't know the maximum length, you will have to resort to a different method that allows you to read one value at a time, like the `scanf()` method I was suggesting. Did you have a look at that? – SukkoPera Feb 16 '15 at 08:27
  • @Kaur: About the different precision, did you bother having a look at the link I gave you? – SukkoPera Feb 16 '15 at 08:27
  • Yes Sir, I bothered to have a look the same day. I have implemented it using `fscanf` as am dealing with large data files and memory usage is a constraint. Your comments were helpful indeed for the clarity. I am not sure if I should put my solution here or shall I leave the post as it is. – Kaur Feb 16 '15 at 09:37
  • Sure, you can write a new answer to your own question. Just remember to *accept* the answer that worked for you. – SukkoPera Feb 16 '15 at 10:14
3

Your coding problem is in :

fgets(rowbuffer,sizeof(rowbuffer),fp)

sizeof(rowbuffer) will give you only the size of the pointer, not the size of the memory allocated to the pointer.

To resolve the issue, you need to supply the proper size of the allocated memory [cols * ( sizeof(float)+sizeof(char)] to fgets().

Your logical problem is in :

You assumed that a printed represntation of a float value will take the same amount of memory as it takes for that of a float variable. No, that's not true. In the printed representation, each digit (including the decimal point and any leading or trailing 0 after the decimal) will consume one byte of memory each. You should keep that in mind while allocating memory for the destination buffer.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261