fread: How to know the size of the buffer?

Question

In C, when I am reading a file, or some other input, all at once with fread how would I know what size to declare the buffer with?

char buffer[1024];
fp = popen("ls /", "r");
fread(buffer, 1, sizeof(buffer), fp);

The input data could have 1000 lines or 1 line, or even 100000 lines or more.

Is there any general rule for it?

You need to determine the size of the file (see duplicate question) and then use `malloc` to allocate a buffer of that size. — Jabberwocky, Apr 06 '16 at 10:35
Not a duplicate: fread could also contain the output of a command itself instead of simply a file `fp = popen("ls /", "r");` — lockdoc, Apr 06 '16 at 10:35
@lockdoc In that case, you can't know it. Non-seekable buffers have an unknown size. — sashoalm, Apr 06 '16 at 10:37
@lockdoc You can't do that. But you can read the file line by line with `fgets` and store the lines dynamically or some other similar method. — Jabberwocky, Apr 06 '16 at 10:37
Why all those downvotes and close requests ? The question as it stands now is OK. — Jabberwocky, Apr 06 '16 at 10:39
I removed the c++ tag. I simply added it as it was suggested, my bad — lockdoc, Apr 06 '16 at 10:40

Lightness Races in Orbit · Accepted Answer · 2016-04-06T10:43:28.783

4

No, there's no general rule, because it depends entirely on what you plan to do with the data. Do you just need to parse it as it comes? That would be ideal. If you know that your input has useful data "samples" in chunks of X bytes, then simply read X bytes at a time and handle them as you go.

If you do need to copy the entire input into a buffer, then you'll have to take an initial guess, and allocate more memory if your guess is insufficient.

In C++ you can just use std::vector (or std::deque if the data need not be contiguous in memory) to automatically expand the buffer as needed.
In C you'll have to malloc first, then realloc inside your read loop when you run out of space in what you've already allocated.

I suggest mimicking the behaviour of std::vector by making your buffer expand exponentially (multiplying by a factor of something like 1.5 or 2 each time), to help reduce the number of times you need to do this. So, say you first allocate 1,024 bytes. When that runs out, allocate 2,048. When that runs out, allocate 4,096. And so forth.

Only you can decide what a good starting size is, based on your use case and expected nominal inputs.

edited Apr 06 '16 at 10:43

answered Apr 06 '16 at 10:38

Lightness Races in Orbit

378,754
76
643
1,055

Thanks for `malloc` and `realloc`. Just looked it up and will have a read about it. However in my case it is read entirely without a loop – lockdoc Apr 06 '16 at 10:56
1

@lockdoc - If you don't know how much data you're going to get, you can't be guaranteed to be able to read it entirely without a loop. – Andrew Henle Apr 06 '16 at 10:58
1

@lockdoc: It can't be, for the very reason you are asking about. What length argument would you provide to `fread`? Although this is not the case for block devices (which is why there are usually ways to accomplish this if your input is definitely a file), speaking generally you cannot know how much data is going to arrive via a stream. That's why it's a stream, not a container. When working with stream data, it's best to imagine that it's _always_ incrementally available such that you have to handle chunks of data as they come. – Lightness Races in Orbit Apr 06 '16 at 11:30
Have you any blog about yourself? Not [this](http://lightnesspyramid.com/), I mean a personal website. Something to be containing your biography, Is there any? – Shafizadeh Apr 06 '16 at 19:43
@LightnessRacesinOrbit . . . Ok, But as a professional programmer you should create a personal website. *(if you would do it let me know)* – Shafizadeh Apr 07 '16 at 13:34
@Shafizadeh: Everything I do is proprietary, confidential and/or classified secret, so I would have nothing to put on it. – Lightness Races in Orbit Apr 07 '16 at 15:09

fread: How to know the size of the buffer?

1 Answers1