C - fwrite binary file bigger than 4GB

Question

I'm basically new to C.

I have a 64-bit Windows7 with 64 GB RAM and 240 GB SSD.

I work with an acquisition board that stores acquired data in 2 internal FIFOs and then passes the data to the RAM (so I can potentially acquire, let's say, 60 GB of data).

What I'm not able to do is to use the fwrite function in order to write a binary file with a size bigger than 4 GB.

Here's my variables:

static UINT64      *rbuffer12 = NULL;
static UINT64      *rbuffer34 = NULL;
FILE               *fd_raw, *fd_raw2;
UINT64             nacq = 2000;
ICS1555_ULONG_T    bufferLength12, bufferLength34;

So, focusing on what happens in FIFO #1, the board makes nacq acquisitions of size bufferLength12 and stores all the stuff in the RAM using the memory pointed by rbuffer12.

bufferLength12 = 524288;
acq_length = 524288 / (channels_number * 2 * 4);
nBytes = bufferLength12 * 4;

rbuffer12 = (UINT64 *) malloc(nacq*nBytes);
memset(rbuffer12, 0, nacq*nBytes);

for (i = 0; i < 4*nacq; i++)
 ReadF(h, 0, (UINT64 *) (rbuffer12 + i * bufferLength12/8), nBytes/4, NULL, 0))

Now I want to write the data to File12.bin.

fd_raw=fopen("File12.bin","wb")
fwrite((UINT64 *) rbuffer12,8,(nacq * 4 * channels_number * acq_length) ,fd_raw);
fclose(fd_raw);
fd_raw=NULL;

When I set nacq=2000, the file size is 4'096'000 bytes. If I try to increase this value, the program hangs and if I quit the acquisition I get a binary file with, for example, 1'960'000 bytes of dimension.

How can I have a bigger binary file?

[Please don't cast the return value of `malloc()` in C](http://stackoverflow.com/a/605858/28169). — unwind, Sep 09 '13 at 09:16

score 8 · Accepted Answer · edited May 09 '19 at 15:44

8

You state in the comments that your compiler is MSVC 2008 and that you target x64.

I suspect that you have been caught out by a runtime library bug. For example see this post: https://web.archive.org/web/20140316203229/connect.microsoft.com/VisualStudio/feedback/details/755018/fwrite-hangs-with-large-size-count

You can write more than 4GB, but you cannot do it with a single call to fwrite. You'll need to make multiple calls passing no more than 4GB at a time.

In any case that's surely a better approach to your problem. Your current approach involves allocating one huge block of memory. The work around would allow you to allocate a smaller block of memory and so place less demand on the system's memory.

edited May 09 '19 at 15:44

mwfearnley

3,303
2
34
35

answered Sep 09 '13 at 09:09

David Heffernan

601,492
42
1,072
1,490

Should I use fseek for each fwrite call to have continous data? – claudiop Sep 09 '13 at 09:25
2

You don't need to call `fseek`, because `fwrite` already advances the file pointer. – David Heffernan Sep 09 '13 at 09:29
I'm making multiple calls to `fwrite`in this way: `fd_raw=fopen("File12.bin","wb"); fwrite((UINT64 *) rbuffer12,8,((nacq/2) * 4 * channels_number *acq_length) ,fd_raw); fwrite((UINT64 *) rbuffer12,8,((nacq/2) * 4 * channels_number *acq_length) ,fd_raw); fclose(fd_raw);`. But testing the acquisition with a sine function I don't see continuity in the waveform in the middle of the file (where I make the 2nd call to fwrite). Am I doing something wrong? – claudiop Sep 09 '13 at 09:59
I would not like to comment on that. I don't know what's in `rbuffer12`. I can't tell what to expect your code to do. However, the workings of `fwrite` are well known. After the call to `fwrite`, the file position for the file is advanced by the number of bytes written. – David Heffernan Sep 09 '13 at 10:08
Ok I would know just if the sequence of the calls was correct. Thank you for your replies. – claudiop Sep 09 '13 at 10:10
1

This is a bit weird though. Aren't you allocating more than 4GB? What compiler are you using, and what is the target platform? How big is your `size_t`? Finally, why do you need to read everything into memory before writing it? Surely it's better to read in data in chunks, and write to file in chunks? – David Heffernan Sep 09 '13 at 10:16
The remaining problem must be in your code. You can test your self quite easily, for small files, that multiple calls to `fwrite` result in the same output as a single call. – David Heffernan Sep 09 '13 at 11:30
The compiler is Microsoft Visual C++ 2008 SP1, target platform x64. I allocate `nacq*nBytes` Byte which is more than 4GB when `nacq` is greater than 2000. With `nacq=4000` I have, with the multiple call, an 8GB file but without continuity in the middle of it. `nacq`,`channels_number` and `acq_length` that are the variables used to evaluate the number of elements to be written are, respectively, UINT64 UINT32 UINT32. Finally the FIFO memory is very small (1MB) and I need a big continuous signal so I just put all the data in the RAM waiting for the end of the acquisition. – claudiop Sep 09 '13 at 11:33
1

The problem is in your code. Write 20 bytes in one go, or 1 byte at a time, 20 times. The resulting file is the same. I'm very sure you don't want to make such a monstrous demand on memory. Break it in to chunks. – David Heffernan Sep 09 '13 at 11:43
@DavidHeffernan: Of course there's no "monstrous" demand on memory since the OS is clever and will figure out that a lot of it is only accessed sporadically. It'll flush it out to the paging file. – Kuba hasn't forgotten Monica Sep 09 '13 at 12:25
@kuba Paging file? That's the last resort. Sucks performance away. – David Heffernan Sep 09 '13 at 12:27
@DavidHeffernan: It's no different from manually writing it to a file, you know :) The OS is clever enough to figure out that the access is sequential, so it will perform quite well. The only silly thing is not using a memory-mapped file to begin with. Might as well get it to the disk where you want it to be all in one go. – Kuba hasn't forgotten Monica Sep 09 '13 at 12:31
@claudiop Your multiple calls to fwrite use the same rbuffer12 pointer. If you don't do anything else to that buffer between the calls to fwrite then what do you expect it to do? Its going to write the first n bytes of the buffer into the file twice. So yes you'd see a problem. You'd need to pass in an adjusted pointer to the second call. However, you are better off avoiding requiring a >4Gb chunk of contiguous memory. Allocating it is highly likely to fail even on 64 bit systems. Especially if the heap gets fragmented. – Pete Sep 09 '13 at 12:31
@kuba Yes, if you use a memory mapped file. That's a solid idea. – David Heffernan Sep 09 '13 at 12:33
@Pete: If the heap is so fragmented in your 64 bit application that you can't get a contiguous 4Gb chunk, you're done for anyway. Remember that a 64 bit address space can fit 2^32 4gb chunks. – Kuba hasn't forgotten Monica Sep 09 '13 at 12:34
I'm trying to allocate small size of memory so I need to create an array of pointers. The size of each memory portion is the size of the acquisition board's buffer, so I need as many pointers as the number of acquisitions are. As I saw I can't initialize in this way `UINT64 *vect_pointer12[nacq];` because I can't pass a parameter. Is there a way to pass the `int` value `nacq` as size of `vect_pointer12`? – claudiop Sep 13 '13 at 08:52
You need to use dynamic memory allocation. Call `malloc` or `calloc`. – David Heffernan Sep 13 '13 at 09:33
Is there any other way to do it? Because if I want to segment the acquired data in small portions, I need an array of pointers witch dimension depends on `nacq`. With `malloc` and `calloc`, if I'm not wrong, I get just one pointer. – claudiop Sep 13 '13 at 12:24
Well, you don't need an array. The whole idea is not to load entire data all at the same time. You only need one buffer. Fill it, write it to file, fill again, write to file again. And so on. But of course malloc can make arrays of pointers and so on. – David Heffernan Sep 13 '13 at 12:26

Kuba hasn't forgotten Monica · Answer 2 · 2013-09-11T14:55:02.413

The other answer has covered almost everything. I'd like to point out that you're not doing what you think you're doing. Specifically, remember that every page in physical RAM can be backed by a page in the paging file (swap file). When you write data to the in-memory array, each page that you write is accessed only once upon writing. It then sits unused for quite a while until you're done with your acquisition and want to write it out. The operating system will, behind your back, page out the data to disk while you're not using it.

What you're then doing when you "write" it to a file is:

You access the data at the beginning of your buffer. This data is likely paged out to disk at this point since it's very old. It may still be in RAM in spite of being at the disk at the same time - that's likely on a battery powered system, where modern OSes are spilling stale RAM to disk all the time to make hibernations faster. If it isn't in RAM anymore, the operating system handles the page fault and reads the data back for you.
You write it out to a file. It goes back to the disk, at a different location.

So the data does a roundtrip from the disk back to disk. This is probably not what you want.

You can handle it in three ways.

Instead of using the system-wide paging file, let the OS use your file as a paging file. You do it by memory-mapping your file, and then simply writing to memory. When you close the mapping, you're guaranteed that all of the memory pages end up in your file. No roundtrips involved.
Have two threads and a set of interlocked buffers. One thread fills up the buffers, the other thread dumps them to disk. The interlock prevents both threads from stepping on rsch other's toes. This lets you use blocking calls which might be easier to deal with if you're not too familiar with winapi.
Have one thread but use non-blocking I/O. That way you can "write" to disk without waiting for the data to actually get there. There are libraries out there to help you with that, boost might be one good choice.

Minor clarification: System only swaps to page if it needs memory for something else. — David Heffernan, Sep 09 '13 at 12:31
Looking at your solutions, I was thinking to use the first one. Since I'm working with C from just 2/3 months, I'm a little bit confused searching on the net for memory mapping. Is the memory mapping performed by MapViewOfFile in Windows? — claudiop, Sep 09 '13 at 14:08

score 0 · Answer 3 · answered Sep 09 '13 at 14:38

0

I may be missing something but to me the obvious choice after fread and fwrite run out of gas is to use the (originally Win32) function set CreateFile, ReadFile, WriteFile and CloseHandle. They are vastly more capable and I assume/guess that the f-functions you are using are wrappers around them.

Since they are more capable they are somewhat more difficult to learn but hey, file-I/O isn't rocket science. If you've implemented code using one set of I/O-functions you won't lose your way implementing with these.

answered Sep 09 '13 at 14:38

Olof Forshell

3,169
22
28

1

`WriteFile` and `ReadFile` cannot write more than 2^32 bytes in a single call. So they aren't going to change a thing here. It's plausible that the x64 bug is that `fwrite` maps on to a single call to `WriteFile`. – David Heffernan Sep 09 '13 at 16:32
But they do allow 64-bit file positioning which is part of the problem. – Olof Forshell Sep 09 '13 at 20:05
That's no part of the problem so far as I can tell. And fseek allows 64 bit positioning no? Or does it not use size_t? – David Heffernan Sep 09 '13 at 20:30

C - fwrite binary file bigger than 4GB

3 Answers3

Linked