reading data from large file into struct in C

Question

I am a beginner to C programming. I need to efficiently read millions of from a file using struct in a file. Below is the example of input file.

2,33.1609992980957,26.59000015258789,8.003999710083008
5,15.85200023651123,13.036999702453613,31.801000595092773
8,10.907999992370605,32.000999450683594,1.8459999561309814
11,28.3700008392334,31.650999069213867,13.107999801635742

I have a current code shown in below, it is giving an error "Error in file" suggesting the file is NULL but file has data.

#include<stdio.h>
#include<stdlib.h>

struct O_DATA
{
    int index;
    float x;
    float y;
    float z;
};

int main ()
{
    FILE *infile ;
    struct O_DATA input;
    infile = fopen("input.dat", "r");
    if (infile == NULL);
    {
            fprintf(stderr,"\nError file\n");
            exit(1);
    }
    while(fread(&input, sizeof(struct O_DATA), 1, infile))
            printf("Index = %d X= %f Y=%f Z=%f", input.index , input.x ,   input.y , input.z);
    fclose(infile);
    return 0;
}

I need to efficiently read and store data from an input file to process it further. Any help would be really appreciated. Thanks in advnace. ~
~
~

Try printing the error (use `perror` function to do it most simply). Most likely reason: current working directory is not what you think, it is something else, and that's why the file isn't found. Try using absolute path to input.dat to see if it helps. — hyde, Mar 07 '19 at 22:02
Indeed. You will need to read it as text (`fscanf` or `fgets` + some parsing.) — Eugene Sh., Mar 07 '19 at 22:08
as @xing pointed out reading text files and parsing them into values is a several step process. — Ahmed Masud, Mar 07 '19 at 22:09
Why are you passing `sizeof(struct O_DATA)` to `fread`? The number of bytes you want to read from the file has nothing to do with how many bytes your platform uses to store `struct O_DATA`! — David Schwartz, Mar 07 '19 at 22:09

pmg · Answer 1 · 2019-03-07T22:10:25.727

1

if (infile == NULL);
{ /* floating block */ }

The above if is a complete statement that does nothing regardless of the value of infile. The "floating" block is executed no matter what infile contains.
Remove the semicolon to 'attach' the "floating" block to the if

if (infile == NULL)
{ /* if block */ }

edited Mar 07 '19 at 22:10

answered Mar 07 '19 at 22:05

pmg

106,608
13
126
198

score 1 · Answer 2 · answered Mar 07 '19 at 22:05

1

You've got an incorrect ; after your if (infile == NULL) test - try removing that...

[Edit: 2nd by 9 secs! :-)]

answered Mar 07 '19 at 22:05

Gwyn Evans

1,361
7
17

Hi, I did remove ";" and it runs however the output it wrong. Probably using fread is abad idea. WIll try to use fscanf or fgets – dipak sanap Mar 07 '19 at 23:11
Yes, fread will just blindly read the size you say, while fscanf will let you take advantage of the formatted data, but read up on it carefully, as it’s easy to get the arguments wrong! – Gwyn Evans Mar 07 '19 at 23:15
This is more of a comment and not really an answer to the problem. – Ahmed Masud Mar 07 '19 at 23:22
Correct- my actual answer addressed the actual question that was asked (Why “Error file”) while this was a comment to suggest a direction for change. – Gwyn Evans Mar 07 '19 at 23:33

Ahmed Masud · Accepted Answer · 2019-03-07T23:42:25.110

First figure out how to convert one line of text to data

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct my_data
{
  unsigned int index;
  float x;
  float y;
  float z;
};



struct my_data *
deserialize_data(struct my_data *data, const char *input, const char *separators)
{
  char *p;                      
  struct my_data tmp;

  if(sscanf(input, "%d,%f,%f,%f", &data->index, &data->x, &data->y, &data->z) != 7)
    return NULL;
  return data;
}

deserialize_data(struct my_data *data, const char *input, const char *separators)
{
  char *p;                      
  struct my_data tmp;
  char *str = strdup(input);    /* make a copy of the input line because we modify it */

  if (!str) {               /* I couldn't make a copy so I'll die */
      return NULL;
  }

  p = strtok (str, separators); /* use line for first call to strtok */
  if (!p) goto err;
  tmp.index = strtoul (p, NULL, 0);         /* convert text to integer */

  p = strtok (NULL, separators);    /* strtok remembers line */
  if (!p) goto err;
  tmp.x = atof(p);

  p = strtok (NULL, separators);
  if (!p) goto err;
  tmp.y = atof(p);

  p = strtok (NULL, separators);
  if (!p) goto err;
  tmp.z = atof(p);

  memcpy(data, &tmp, sizeof(tmp)); /* copy values out */
  goto out;

err:
  data = NULL;

out:
  free (str);
  return data;
}

int main() {
    struct my_data somedata;
    deserialize_data(&somedata, "1,2.5,3.12,7.955", ",");
    printf("index: %d, x: %2f, y: %2f, z: %2f\n", somedata.index, somedata.x, somedata.y, somedata.z);
}

Combine it with reading lines from a file:

just the main function here (insert the rest from the previous example)

   int
   main(int argc, char *argv[])
   {
       FILE *stream;
       char *line = NULL;
       size_t len = 0;
       ssize_t nread;
       struct my_data somedata;

       if (argc != 2) {
           fprintf(stderr, "Usage: %s <file>\n", argv[0]);
           exit(EXIT_FAILURE);
       }

       stream = fopen(argv[1], "r");
       if (stream == NULL) {
           perror("fopen");
           exit(EXIT_FAILURE);
       }

       while ((nread = getline(&line, &len, stream)) != -1) {
           deserialize_data(&somedata, line, ",");
           printf("index: %d, x: %2f, y: %2f, z: %2f\n", somedata.index, somedata.x, somedata.y, somedata.z);
       }

       free(line);
       fclose(stream);
       exit(EXIT_SUCCESS);
   }

notice that I used getline rather than fgets etc. because that's now the preferred way to read a line from a file. obviously you can change the `printf` with actual processing of the data. — Ahmed Masud, Mar 07 '19 at 23:24
Wouldn’t just using fscanf be simpler and more straightforward? — Gwyn Evans, Mar 07 '19 at 23:29
The code with `fscanf` don't care about *lines* (since `fscanf` deals with `\n` like with space characters). — Basile Starynkevitch, Mar 08 '19 at 00:00
Thanks everyone for the help! I am able to print the content of the file as desired. However, I need to save the data in the same struct as O_data so that I can work with it outside int main. Can someone help with that? Ideally, I should be able to access ith point outside like somedata[i]. — dipak sanap, Mar 20 '19 at 21:17
Also, @AhmedMasud can you provide more comments on deserialize_data. I am not understanding what char * p is doing. — dipak sanap, Mar 20 '19 at 21:20

Alex · Answer 4 · 2019-03-08T21:06:30.050

You already have solid responses in regard to syntax/structs/etc, but I will offer another method for reading the data in the file itself: I like Martin York's CSVIterator solution. This is my go-to approach for CSV processing because it requires less code to implement and has the added benefit of being easily modifiable (i.e., you can edit the CSVRow and CSVIterator defs depending on your needs).

Here's a mostly complete example using Martin's unedited code without structs or classes. In my opinion, and especially so as a beginner, it is easier to start developing your code with simpler techniques. As your code begins to take shape, it is much clearer why and where you need to implement more abstract/advanced devices.

Note this would technically need to be compiled with C++11 or greater because of my use of std::stod (and maybe some other stuff too I am forgetting), so take that into consideration:

//your includes
//...
#include"wherever_CSVIterator_is.h"

int main (int argc, char* argv[]) 
{
  int index;
  double tmp[3]; //since we know the shape of your input data
  std::vector<double*> saved = std::vector<double*>();
  std::vector<int> indices;

  std::ifstream file(argv[1]);
  for (CSVIterator loop(file); loop != CSVIterator(); ++loop) { //loop over rows
    index = (*loop)[0]; 
    indices.push_back(index); //store int index first, always col 0
    for (int k=1; k < (*loop).size(); k++) {                    //loop across columns
       tmp[k-1] = std::stod((*loop)[k]); //save double values now
    }
    saved.push_back(tmp);
  }

 /*now we have two vectors of the same 'size'
  (let's pretend I wrote a check here to confirm this is true), 
  so we loop through them together and access with something like:*/

  for (int j=0; j < (int)indices.size(); j++) {
    double* saved_ptr = saved.at(j); //get pointer to first elem of each triplet
    printf("\nindex: %g |", indices.at(j));
    for (int k=0; k < 3; k++) {
      printf(" %4.3f ", saved_ptr[k]);
    }
    printf("\n");
  }
}

Less fuss to write, but more dangerous (if saved[] goes out of scope, we are in trouble). Also some unnecessary copying is present, but we benefit from using std::vector containers in lieu of knowing exactly how much memory we need to allocate.

Basile Starynkevitch · Answer 5 · 2019-03-08T00:14:37.297

Don't give an example of input file. Specify your input file format -at least on paper or in comments- e.g. in EBNF notation (since your example is textual... it is not a binary file). Decide if the numbers have to be in different lines (or if you might accept a file with a single huge line made of million bytes; read about the Comma Separated Values format). Then, code some parser for that format. In your case, it is likely that some very simple recursive descent parsing is enough (and your particular parser won't even use recursion).

Read more about <stdio.h> and its routines. Take time to carefully read that documentation. Since your input is textual, not binary, you don't need fread. Notice that input routines can fail, and you should handle the failure case.

Of course, fopen can fail (e.g. because your working directory is not what you believe it is). You'll better use perror or errno to find more about the failure cause. So at least code:

infile = fopen("input.dat", "r");
if (infile == NULL) {
  perror("fopen input.dat");
  exit(EXIT_FAILURE);
}

Notice that semi-colons (or their absence) are very important in C (no semi-colon after condition of if). Read again the basic syntax of C language. Read about How to debug small programs. Enable all warnings and debug info when compiling (with GCC, compile with gcc -Wall -g at least). The compiler warnings are very useful!

Remember that fscanf don't handle the end of line (newline) differently from a space character. So if the input has to have different lines you need to read every line separately.

You'll probably read every line using fgets (or getline) and parse every line individually. You could do that parsing with the help of sscanf (perhaps the %n could be useful) - and you want to use the return count of sscanf. You could also perhaps use strtok and/or strtod to do such a parsing.

Make sure that your parsing and your entire program is correct. With current computers (they are very fast, and most of the time your input file sits in the page cache) it is very likely that it would be fast enough. A million lines can be read pretty quickly (if on Linux, you could compare your parsing time with the time used by wc to count the lines of your file). On my computer (a powerful Linux desktop with AMD2970WX processor -it has lots of cores, but your program uses only one-, 64Gbytes of RAM, and SSD disk) a million lines can be read (by wc) in less than 30 milliseconds, so I am guessing your entire program should run in less than half a second, if given a million lines of input, and if the further processing is simple (in linear time).

You are likely to fill a large array of struct O_DATA and that array should probably be dynamically allocated, and reallocated when needed. Read more about C dynamic memory allocation. Read carefully about C memory management routines. They could fail, and you need to handle that failure (even if it is very unlikely to happen). You certainly don't want to re-allocate that array at every loop. You probably could allocate it in some geometrical progression (e.g. if the size of that array is size, you'll call realloc or a new malloc for some int newsize = 4*size/3 + 10; only when the old size is too small). Of course, your array will generally be a bit larger than what is really needed, but memory is quite cheap and you are allowed to "lose" some of it.

But StackOverflow is not a "do my homework" site. I gave some advice above, but you should do your homework.

reading data from large file into struct in C

5 Answers5

First figure out how to convert one line of text to data

Combine it with reading lines from a file: