0

I'm trying to write-to-disk an array containing 11.26 million uint16_t values. The total memory size should be ~22 MB. However, the size of my file is 52MB. I'm using fprintf to write the array to disk. I thought maybe the values were being promoted. I tried to be explicit but it seems to make no difference. The size of my file is stubbornly unchanged.

What am I doing wrong? Code follows.

#define __STDC_FORMAT_MACROS
...
uint32_t dbsize = 11262336;
uint16_t* db_ = new uint16_t[dbsize_];
...
char fname[256] = "foo";

FILE* f = fopen(fname, "wb");
if(f == NULL) 
{
    return;
}

fprintf(f, "%i\t", dbsize_);
for(uint32_t i = 0; i < dbsize_; i++)
{
    fprintf(f, "%" SCNu16 "", db_[i]);
}
fclose(f);
Daniel
  • 23,365
  • 10
  • 36
  • 34
  • Did you open your file? It should be surprisingly readable in a text editor (unless there's something non-standard behind that SCNu16 macro). – Mat May 08 '13 at 12:35
  • Hi Mat. Notice I write a binary file. I did not try to inspect it through a hex editor. – Daniel May 08 '13 at 12:36
  • 3
    That's exactly what I'm trying to point out. Open it in a text editor. – Mat May 08 '13 at 12:37
  • 1
    So you know that the contents of your file are incorrect, but you didn't even look at what it's in there. Wow. – Jon May 08 '13 at 12:38
  • Saving memory content in text format will always make it larger. – Marc Claesen May 08 '13 at 12:38
  • @Daniel - required reading: http://stackoverflow.com/questions/229924/difference-between-files-writen-in-binary-and-text-mode?rq=1 – Joris Timmermans May 08 '13 at 12:40
  • Jon: that's not very helpful. I already said the file is written in binary. It's gibberish. I'm working from the command line. I do not have a hex editor handy. I shouldn't need one for this kind of thing. – Daniel May 08 '13 at 12:41
  • @Daniel `fprintf` doesn't print out binary, it prints out text. That's what Mat and Jon are suggesting. – Angew is no longer proud of SO May 08 '13 at 12:42
  • @Daniel You don't need hex editor. Open it with text editor such as Notepad and vim. – johnchen902 May 08 '13 at 12:43
  • `fprintf` writes whatever you pass as printf-style text, regardless of how you opened the file. `fwrite` writes what you pass in directly, without pretty-printing. I think you want `fwrite`. – dascandy May 08 '13 at 12:44
  • @Daniel: Or use `head`, since you're at the command line. – RichieHindle May 08 '13 at 12:44
  • @Daniel: You have a huge misconception here then. If you want ascii, then your assumption as of what the file size should be is completely wrong. `unint16_t` range from `0` to `65535`, so each single element in the array will take anywhere from 1 to 5 characters. If the distribution is random it will take on average 5 characters (rounding). Add the separator and you get 6 chars per number. 11 millions times 6 characters amounts to roughly 60M. The file is smaller as your data distribution seems to be offsetted a bit towards the low numbers. – David Rodríguez - dribeas May 08 '13 at 12:45
  • @David Have you just discovered a fault in his random number generator? Or some fraud going on? – Peter Wood May 08 '13 at 12:48
  • 1
    @chepner `(10 * 1 + 90 * 2 + 900 * 3 + 9000 * 4 + 55536 * 5) / 65536 = 4.830474853515625` It's 5, not 2.5. – johnchen902 May 08 '13 at 12:49
  • 1
    @chepner: Is it? There are 10000 numbers with < 5 characters, while there are 64535 numbers with 5 characters in the range 0..65535. The exact number is lower than 5, but closer to 5 than 4 – David Rodríguez - dribeas May 08 '13 at 12:49
  • Thank you folks. Your input has been much appreciated. – Daniel May 08 '13 at 12:51
  • jonchen, David: mea culpa. – chepner May 08 '13 at 12:53

3 Answers3

8

You're writing ASCII to your file, not binary.

Try writing your array like this instead of using fprintf in a loop.

fwrite(db_, sizeof(db_[0]), dbsize, f);

fprintf always formats numbers and other types to text, whether you've opened the file in binary mode or not. Binary mode just keeps the runtime from doing things like converting \n to \r\n.

Collin
  • 11,977
  • 2
  • 46
  • 60
  • Hero! Thank you Collin. Sorry about the ninja edit to my post. I realised I mixed up two variables when pasting (db_ and dbsize_). – Daniel May 08 '13 at 12:47
  • BTW, is there any good online resource that discusses these issues? – Daniel May 08 '13 at 12:48
  • @Daniel I guess you could check out the docs for [``](http://en.cppreference.com/w/c/io). – Collin May 08 '13 at 12:50
2

fprintf will convert you number to a series of ASCII characters and write them to a file. Depending on its value, a 32-bit int will be from 1 to 10 characters long when expressed as a string. You need to use fwrite to write raw binary values to a file.

Ferruccio
  • 98,941
  • 38
  • 226
  • 299
2

The source of confusion is likely to be that the "b" in FILE* f = fopen(fname, "wb"); does not do what you think it does.

Most significantly, it doesn't change any of the print or scan statements to use binary values instead of ASCII values. Like others have said - use fwrite instead.

Community
  • 1
  • 1
Joris Timmermans
  • 10,814
  • 2
  • 49
  • 75