0

In C, when I am reading a file, or some other input, all at once with fread how would I know what size to declare the buffer with?

char buffer[1024];
fp = popen("ls /", "r");
fread(buffer, 1, sizeof(buffer), fp);

The input data could have 1000 lines or 1 line, or even 100000 lines or more.

Is there any general rule for it?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
lockdoc
  • 1,539
  • 1
  • 18
  • 31

1 Answers1

4

No, there's no general rule, because it depends entirely on what you plan to do with the data. Do you just need to parse it as it comes? That would be ideal. If you know that your input has useful data "samples" in chunks of X bytes, then simply read X bytes at a time and handle them as you go.

If you do need to copy the entire input into a buffer, then you'll have to take an initial guess, and allocate more memory if your guess is insufficient.

  • In C++ you can just use std::vector (or std::deque if the data need not be contiguous in memory) to automatically expand the buffer as needed.
  • In C you'll have to malloc first, then realloc inside your read loop when you run out of space in what you've already allocated.

    I suggest mimicking the behaviour of std::vector by making your buffer expand exponentially (multiplying by a factor of something like 1.5 or 2 each time), to help reduce the number of times you need to do this. So, say you first allocate 1,024 bytes. When that runs out, allocate 2,048. When that runs out, allocate 4,096. And so forth.

    Only you can decide what a good starting size is, based on your use case and expected nominal inputs.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • Thanks for `malloc` and `realloc`. Just looked it up and will have a read about it. However in my case it is read entirely without a loop – lockdoc Apr 06 '16 at 10:56
  • 1
    @lockdoc - If you don't know how much data you're going to get, you can't be guaranteed to be able to read it entirely without a loop. – Andrew Henle Apr 06 '16 at 10:58
  • 1
    @lockdoc: It can't be, for the very reason you are asking about. What length argument would you provide to `fread`? Although this is not the case for block devices (which is why there are usually ways to accomplish this if your input is definitely a file), speaking generally you cannot know how much data is going to arrive via a stream. That's why it's a stream, not a container. When working with stream data, it's best to imagine that it's _always_ incrementally available such that you have to handle chunks of data as they come. – Lightness Races in Orbit Apr 06 '16 at 11:30
  • Have you any blog about yourself? Not [this](http://lightnesspyramid.com/), I mean a personal website. Something to be containing your biography, Is there any? – Shafizadeh Apr 06 '16 at 19:43
  • @LightnessRacesinOrbit . . . Ok, But as a professional programmer you should create a personal website. *(if you would do it let me know)* – Shafizadeh Apr 07 '16 at 13:34
  • @Shafizadeh: Everything I do is proprietary, confidential and/or classified secret, so I would have nothing to put on it. – Lightness Races in Orbit Apr 07 '16 at 15:09