1

I am trying to do several exercises to understand the difference between write text and binary files on C, and when looking at results with an hexdump utility I am finding unexpected results. Can you please help me to understand the reason ?

Particularly, I am trying the following code for writing a text file:

#include <stdio.h>

int main() {
    FILE *ptr_myfile;
    char c = 'a';
    int numero = 12345;

    ptr_myfile = fopen("test.txt","w");

    if (!ptr_myfile){
        printf("Unable to open file!");
        return 1;
    }

    fwrite(&c, sizeof(char), 1, ptr_myfile);
    fwrite(&numero, sizeof(int), 1, ptr_myfile);

    fclose(ptr_myfile);

    return 0;
}

When doing a "cat test.txt" I found that the contents of the file are:

cat test.txt

a90

Can not understand how 12345 was converted to 90.

Moreover If I do a

hexdump test.txt

0000000 3961 0030 0000
0000005

On that case, I am findig a first byte written with the value 39. Why ? Second value (61) already matches the ascii value fo 'a'' (61 hex = 97 dec = 'a' ascii code), but can not find a logical explanation for the rest of the bits.

If I change the writing mode to binary file, modifying the line

ptr_myfile=fopen("test.txt","w")  by ptr_myfile=fopen("test.txt","wb")

I do not see any change on behavior on the written contents of the file.

Marco
  • 7,007
  • 2
  • 19
  • 49
AndresG
  • 69
  • 6
  • 1
    With `fwrite` you write the raw binary data of the values, not their text representations. And for an `int` that's usually four bytes of data. – Some programmer dude Jan 08 '23 at 12:04
  • 3
    Hint, 12345 in hex is 3039. – n. m. could be an AI Jan 08 '23 at 12:06
  • 5
    Use `hexdump -C`, it should be less confusing – Mat Jan 08 '23 at 12:06
  • @Someprogrammerdude, so, with fwrite I am always writing on binary, no matter on which mode I did the open of the file ? Are you saying that I am treating the file as binary insted of text ? Why the "fopen" mode is ignored ? – AndresG Jan 08 '23 at 12:14
  • 2
    Yes that's correct. The decimal value `12345` will be written as the four bytes `0x00003039`. If you want to write text, use e.g. `fprintf` like `fprintf(ptr_myfile, "%c%d", c, numero)` – Some programmer dude Jan 08 '23 at 12:16
  • "Why the "fopen" mode is ignored?" Read about (don't guess, read) what fopen mode actually means. – n. m. could be an AI Jan 08 '23 at 13:04
  • @n.m. thks, from https://stackoverflow.com/questions/43777913/the-difference-in-file-access-mode-w-and-wb I understand that the only difference is how a few characters (ie, end of line, are converted, either on \r\n in text mode or just \n in binary), but it has not effect on writring raw or text... (I was misunderstanding that). – AndresG Jan 08 '23 at 13:43

2 Answers2

3

The contents of the file test.txt is:

$ hexdump -C test.txt

00000000  61 39 30 00 00                                    |a90..|
00000005

The first byte 61 is 'a' and the bytes after that is the little-endian representation of 12345.

39 30 00 00 are 4 bytes which is the typical size for an int.

Note that this number is not 0x39300000 but 0x00003039.

The byte order of the number written is dependent on the endianness of your system.

You can observe this yourself, by using htonl to convert from host endianness to big-endian (network byte order):

#include <stdio.h>

int main() {
    FILE *ptr_myfile;
    char c = 'a';
    int numero = 12345;
    ptr_myfile = fopen("test.txt","w");

    if (!ptr_myfile) {
        printf("Unable to open file!");
        return 1;
    }

    // convert from host endianness to network byte order
    int numero_big_endian = htonl(numero);

    fwrite(&c, sizeof(char), 1, ptr_myfile);
    fwrite(&numero_big_endian, sizeof(int), 1, ptr_myfile);
    fclose(ptr_myfile);

    return 0;
}

This will yield:

$ hexdump -C test.txt

00000000  61 00 00 30 39                                    |a..09|
00000005

As you can see the byte order is now reversed.

This is one of the reasons why you might not want to write binary data directly to disk because of the differences in endianness.

A big-endian system will recognize 0x00003039 as 0x39300000 which would be 959447040 and not 1234.

As others have mentioned, fwrite does not write data in their string representation.

If you want that, you can use snprintf (or use fprintf) to convert your number to a string first, then write it to a file:

#include <stdio.h>
#include <string.h>

int main() {
    FILE *ptr_myfile;
    char c = 'a';
    int numero = 12345;
    ptr_myfile = fopen("test.txt","w");

    if (!ptr_myfile) {
        printf("Unable to open file!");
        return 1;
    }

    // convert numero to a string
    char numero_str[64];
    // check result of snprintf, omitted for readability
    snprintf(numero_str, sizeof(numero_str), "%d", numero);

    fwrite(&c, sizeof(char), 1, ptr_myfile);
    fwrite(numero_str, strlen(numero_str), 1, ptr_myfile);
    fclose(ptr_myfile);

    return 0;
}
$ cat test.txt

a12345
Marco
  • 7,007
  • 2
  • 19
  • 49
2

When you use fwrite the write function processes data as if it is binary of a certain length. This has nothing to do with the file opening mode you selected earlier.

Lets consider the following example:

/** A character buffer. */
char *ascii_buf = "ABCD";

/** A buffer which contains binary representation of A, B, C, D letters in ASCII. */
uint8_t binary_buf[4] = { 65, 66, 67, 68 };

written = fwrite(ascii_buf, 1, strlen(ascii_buf), fout);
written = fwrite(binary_buf, 1, sizeof(binary_buf), fout);

The above two calls to fwrite result in the same output "ABCD" into the target output file.

The only difference resids in the way the data is interpreted. In the first case ascii_buf data is interpreted as character. While in the second case binary_buf data is interpreted as unsigned integers. There content is the same, but their representation is different.

You will usually want to use:

  • fprintf to output formatted strings to a file.
  • fwrite to output raw data to a file.
Jib
  • 1,334
  • 1
  • 2
  • 12