Writing a byte to a file and then reading the same byte are not the same

Question

Basically I have a file, and in this file I am writing 3 bytes, and then I'm writing a 4 byte integer. In another application I read the first 3 bytes, and then I read the next 4 bytes and convert them to an integer.

When I print out the value, I have very different results...

fwrite(&recordNum, 2, 1, file);    //The first 2 bytes (recordNum is a short int)
fwrite(&charval, 1, 1, file);      //charval is a single byte char
fwrite(&time, 4, 1, file);
// I continue writing a total of 40 bytes

Here is how time was calculated:

time_t rawtime;
struct tm * timeinfo;

time(&rawtime);
timeinfo = localtime(&rawtime);

int time = (int)rawtime;

I have tested to see that sizeof(time) is 4 bytes, and it is. I have also tested using an epoch converter to make sure this is the correct time (in seconds) and it is.

Now, in another file I read the 40 bytes to a char buffer:

char record[40];
fread(record, 1, 40, file);

// Then I convert those 4 bytes into an uint32_t
uint32_t timestamp =(uint32_t)record[6] | (uint32_t)record[5] << 8 | (uint32_t)record[4] << 16 | (uint32_t)record[3] << 24;

printf("Testing timestamp = %d\n", timestamp);

But this prints out -6624. The expected value is 551995007.

EDIT

To be clear, everything else that I am reading from the char buffer is correct. After this timestamp I have text, which I simply print and it runs fine.

1) You are assuming a certain endianness here. Why? Have you tried the opposite? 2) How do you print the `time` value? And why do you think `-6624` is incorrect? — Eugene Sh., Jun 28 '17 at 19:27
"application I read the first 3 bytes, and then I read the next 4 bytes" <- you don't show the code for this, just the code to read 40 bytes - you should start with symmetrical writing and reading. — crashmstr, Jun 28 '17 at 19:33
btw better to put it in a struct, it is easier to handle that way, then u just read and write the struct — AndersK, Jun 28 '17 at 19:34
@Logan Curious, why does code use `4` in `fwrite(&time, 4, 1, file);` versus `fwrite(&time, sizeof time, 1, file);`? Does code want to write 4 bytes here or write all the bytes of `time`, what ever size it may be? There are reasons for either - it depends on coding goals. — chux - Reinstate Monica, Jun 28 '17 at 19:44
@EugeneSh. I saw this algorithm online, which is why I'm using it. I also tried changing the order by switching index 6 with 3 and 5 with 4. (I assume that's the opposite). 2) I believe this is incorrect because when I write the timestamp to the file I print the number first, and it was: 551992286 — Logan, Jun 28 '17 at 19:45
@Logan `uint32_t timestamp` should _never_ get printed out as `-6624`, a negative number. Post the code that did the printing. — chux - Reinstate Monica, Jun 28 '17 at 19:45
@chux The question is updated with the print statement ---- So I could've also done sizeof(time) (which I checked is also 4). But for the specifications of my program, I need it to print 4, which is why I set it to 4. — Logan, Jun 28 '17 at 19:45
Was the file opened in binary or text mode for the read/write? That affect various translations. — chux - Reinstate Monica, Jun 28 '17 at 19:47
Save time (for us too), a good compiler with all warnings enabled will warn about `printf("Testing timestamp = %d\n", timestamp);` — chux - Reinstate Monica, Jun 28 '17 at 19:49
Re the recent question edit: `uint32_t` does not match the `printf` format specifier `%d`. — Weather Vane, Jun 28 '17 at 19:49
In addition to [@dbush fine answer](https://stackoverflow.com/a/44811091/2410359), open in binary mode. — chux - Reinstate Monica, Jun 28 '17 at 19:49
@chux how do you open in binary mode? And what is the difference between binary mode and a regular read mode? — Logan, Jun 28 '17 at 19:51
@Logan 1) If code is mixing text with potential line feeds and binary data in the same file, there are additional issues. 2) "But this prints out -6624" --> what was expected. 3) What was the values of `recordNum, charva, time`? — chux - Reinstate Monica, Jun 28 '17 at 19:58
@chux So basically I expected time to be 551995007 (as this is the value printed in the other file). The value of recordNum is 1, and charva is 0. — Logan, Jun 28 '17 at 20:01
@Logan `551995007` --> `0x20E6C67F` and `-6624` --> `FFFFFE620` so certainly there is at least an endian issue. — chux - Reinstate Monica, Jun 28 '17 at 20:03
there is no need, in the posted code, to use anything but `rawtime` so the call to `localtime()` and the struct tm are not needed/should be removed. This line: `int time = (int)rawtime;` is modifying the actual `rawtime` to be an integer. It is very poor programming practice to have a variable the same name as a system function. — user3629249, Jun 30 '17 at 00:09
this line: `fwrite(&time, 4, 1, file);` is error prone. better to use: `fwrite( &rawtime, sizeof( time_t ), file ); — user3629249, Jun 30 '17 at 00:13
this line: `uint32_t timestamp =(uint32_t)record[6] | (uint32_t)record[5] << 8 | (uint32_t)record[4] << 16 | (uint32_t)record[3] << 24;` (probably) will not recover the timestamp. Strongly suggest generating a struct that contains the 3 fields, declare it 'packed', then extract the third field, which should have type `time_t` — user3629249, Jun 30 '17 at 00:19

dbush · Answer 1 · 2017-06-28T19:37:09.393

3

You write the time at once with fwrite, which uses the native byte ordering, then you explicitly read the individual bytes in big-endian format (most significant byte first). Your machine is likely using little-endian format for byte ordering, which would explain the difference.

You need to read/write in a consistent manner. The simplest way to do this is to fread one variable at a time, just like you're writing:

fread(&recordNum, sizeof(recordNum), 1, file); 
fread(&charval, sizeof(charval), 1, file); 
fread(&time, sizeof(time), 1, file);

Also note the use of sizeof to calculate the size.

edited Jun 28 '17 at 19:37

answered Jun 28 '17 at 19:33

dbush

205,898
23
218
273

@EugeneSh. `fread(&time, sizeof(time), 1, file);` is better iff `fwrite(fwrite(&time, sizeof(time), 1, file);` is used. Else code should used `time = 0; fread(&time, sizeof(time), 1, file);` `fread/fwrite` has various concerns about type sizes. Suggest `(u)intn_t` types. – chux - Reinstate Monica Jun 28 '17 at 19:37
@chux Sure. Just lazy to stuff it into the comment. – Eugene Sh. Jun 28 '17 at 19:39
@dbush Thank you, I will try this and see how it goes. – Logan Jun 28 '17 at 19:51
@dbush So I tried your method, but I'm having some issues. Is this statement okay? uint32_t timestamp; fread(&timestamp, sizeof(timestamp), 1, file); – Logan Jun 28 '17 at 20:45
@Logan That should work, assuming you read the prior 3 bytes first. – dbush Jun 28 '17 at 20:50
@dbush so something weird is happening, the first byte is read fine. It is a 1. The next byte, should have the following value 0b11000000. However, it is actually a 0 when I print it.. Here is how I write it: unsigned int i = 0b11000000; unsigned char charval = i; fwrite(&charval, 1, 1, file); Sorry this formatting is terrible because I don't know how to make code snippets in comments. – Logan Jun 28 '17 at 21:00
1

@Logan You shouldn't be reading 1 byte for the first value. You should be reading 2, just like you're writing 2. – dbush Jun 28 '17 at 21:02
@dbush It's been a long exhausting day.. Thanks haha – Logan Jun 28 '17 at 21:09

little_birdie · Accepted Answer · 2017-07-04T17:08:41.667

You problem is probably right here:

uint32_t timestamp =(uint32_t)record[6] | (uint32_t)record[5] << 8 | (uint32_t)record[4] << 16 | (uint32_t)record[3] << 24;
printf("Testing timestamp = %d\n", timestamp);

You've used fwrite to write out a 32 bit integer.. in whatever order the processor stored it in memory.. and you don't actually know what byte ordering (endian-ness) the machine used. Maybe the first byte written out is the lowest byte of the integer, or maybe it's the highest byte of the integer.

If you're reading and writing the data on the same machine, or on different machines with the same architecture, you don't need to care about that.. it will work. But if the data is written on an architecture with one byte ordering, and potentially read in on an architecture with another byte ordering, it will be wrong: Your code needs to know what order the bytes should be in memory and what order they will be read/written on disk.

In this case, in your code, you are doing a mix of both: You write them out in whatever endian-ness the machine uses natively.. then when you read them in, you start shifting the bits around as if you know what order they were originally in.. but you don't, because you didn't pay attention to the order when you wrote them out.

So if you're writing and reading the file on the same machine, or identical machine (same processor, OS, compiler, etc), just write them out in the native order (without worrying about what that is) and then read them back in exactly as you wrote them out. If you write them and read them on the same machine, it'll work.

So if your timestamp is located at offset 3 through 6 of your record, just do this:

uint_32t timestamp;
memcpy(&timestamp, record+3, sizeof(timestamp);

Note that you cannot directly cast record+3 to a uint32_t pointer because it might violate the systems word alignment requirements.

Note also that you should probably be using time_t type to hold the timestamp, if you're on a unix-like system, that'll be the natural type supplied to hold epoch time values.

But if you are planning to move this file to another machine at any point and try to read it there, you could easily end up with your data on a system that has different endian-ness or different size for time_t. Simply writing bytes in and out of a file with no thought to the endian-ness or size of types on different operating systems is just fine for temporary files or for files which are meant to be used on one computer only and which will never be moved to other types of system.

Making data files that are portable between systems is a whole subject in itself. But the first thing you should do, if you care about that, is to look at functions htons(), ntonhs(), htonl(), ntonhl(), and their ilk.. which convert to and from the system native endian-ness to a known (big) endian-ness which is the standard for internet communications and generally used for interoperability (even though Intel processors are little-endian and dominate the market these days). These function do something similar to what you were doing with your bit-shifting but since someone else wrote it, you don't have to. It's a lot easier to use the library functions for this!

For example:

#include <stdio.h>
#include <arpa/inet.h>

int main() {

    uint32_t x = 1234, y, z;

    // open a file for writing, convert x from native to big endian, write it.
    FILE *file = fopen("foo.txt", "w");
    z = htonl(x);
    fwrite(&z, sizeof(z), 1, file);
    fclose(file);

    file = fopen("foo.txt", "r");
    fread(&z, sizeof(z), 1, file);
    x = ntohl(z);
    fclose(file);        

    printf("%d\n", x);

}

NOTE I am NOT CHECKING FOR ERRORS in this code, it is just an example.. do not use functions like fopen, fread etc without checking for errors.

By using these functions both when writing the data out to disk and when reading it back, you guarantee that the data on disk is always big-endian.. eg htonl() when on a big-endian platform does nothing, when on a little-endian platform it does the conversion from bit to little endian. And ntohl() does the opposite. So your data on disk will always be read in correctly.

Thanks for your reply. Yes I would use time_t if I could, but unfortunately I only have 4 bytes for the timestamp, and I believe that time_t is much bigger than that. So I had to convert it to seconds. I will try what you said and get back to you. Thanks! — Logan, Jun 28 '17 at 20:24
Also, I have another question. Can you specify what endianness you want to use? — Logan, Jun 28 '17 at 20:24
No, you can't specify endian-ness, it is based on the type of processor. If you want to make sure your data is always the same endian-ness on disk, use (for a 32 bit int) `htonl()` before writing the value to disk, and `ntohl()` when reading it back from the disk to memory. And that way, your data will always be big-endian on disk and the functions will do any conversion necessary bringing it in and out of memory (if you are on a big-endian machine, the functions do nothing, if you are on a little-endian machine, they will reverse the order). — little_birdie, Jun 28 '17 at 20:33
And yea, I do believe that time_t is 64 bits on most operating systems now. The old 32 bit epoch time was doomed to run out of space in 2038.. so it's been slowly moving to 64 bits. It's likely that humanity won't exist anymore when that runs out.... — little_birdie, Jun 28 '17 at 20:37
So suppose I wanted to use these htonl() and ntohl() functions.. Before I write anything to a file I would first need to call one of these functions on that variable? And secondly, what about uint8_t variables? I imagine you wouldn't need to call any of those functions on it? — Logan, Jun 28 '17 at 20:51
No, uint8_t being a single byte under any circumstances you would likely encounter, does not have endian-ness issues. However, these functions I am telling you about are at least 30 years old, maybe older, and therefore there's no 64 bit function I know of.. but someone must have added one by now. You'll have to do a bit of looking around if you need to save 64 bit integers. — little_birdie, Jun 28 '17 at 21:45

Writing a byte to a file and then reading the same byte are not the same

2 Answers2