1

I'm writing a program that reads an ASCII file and then converts it to a Binary file, as I see it's not such a hard task, but understanding what's happening behind is ...

As I understand, an ASCII file is just human readable text, so if we want to create a new file full of ASCII's, a simple loop with a fputc() would be enough and for a binary file fwrite() will do the job right?

So my question here is, once that the ASCII to Binary conversion is done, what should I see in my .bin file? It should be filled with exactly the same symbols <88><88><88><88><88>?

Code:

/*
*  From "Practical C Programming 2nd Edition"
*  Exercise 14-4: Write a program that reads an ASCII file containing a list of numbers
*  and writes a binary file containing the same list. Write a program that goes the
*  other way so that you can check your work.
*
*/

#include <stdio.h>
#include <stdlib.h>

const char *in_filename = "bigfile.txt";
const char *out_filename = "out_file.bin";

int main()
{

    int ch = 0;

    /* ASCII */
    FILE *in_file = NULL;

    in_file = fopen(in_filename, "r");

    if(!in_file)
    {
         fprintf(stderr, "ERROR: Could not open file %s ... ", in_filename);
         exit(EXIT_FAILURE);
    }

    /* Binary */
    FILE *out_file = NULL;

    out_file = fopen(out_filename, "w+b");

    if(!out_file)
    {
         fprintf(stderr, "ERROR: New file %s, could not be created ... ", out_filename);
         exit(EXIT_FAILURE);

    }

    while(1)
    {
        ch = fgetc(in_file);
            if(ch == EOF)
                break;
            else
               fwrite(in_file, sizeof(char), 1, out_file);
    }

        fclose(in_file);
        fclose(out_file);

    return 0;

}

I'm generating the input file with this shell script:

tr -dc "0-9" < /dev/urandom | fold -w100|head -n 100000 > bigfile.txt

Any help would be very appreciate it.

Thanks.

Paul S-Pou
  • 93
  • 1
  • 11
  • Maybe I'm misinterpreting your assignment, but as I read it, you want to read your ASCII test file as *numbers*, perhaps using `fscanf(in_file, "%d", &ch)`. If you do it that way, an input file containing "18 52 86 120" would result in a 4-byte binary output file containing the four bytes `0x12`, `0x34`, `0x56`, and `0x78`. – Steve Summit Jun 26 '21 at 00:14
  • And if you can get it to work that way, if you give it the input `72 101 108 108 111 44 32 119 111 114 108 100 33 10`, you should end up with a "binary" output file that's actually also a text file, after all... – Steve Summit Jun 26 '21 at 00:16
  • 1
    Your question here "what should I see in the output, should it be same/different to input?" is the first thing I'd like to ask you and the first thing I'd be asking my instructor. It's not an English course, where the aim is to understand what the question is getting at - it's a programming one. As written, the question feels poor. I'd expect a sample of the input and output as a minimum... – enhzflep Jun 26 '21 at 00:17
  • @enhzflep Thanks for the suggestion, I just changed the title, and sorry for the poor writing, English isn't my first language. – Paul S-Pou Jun 26 '21 at 00:27
  • @SteveSummit thanks for the comment, I would like to ask you about a good resource to better understand these topics , since the book that I currently read I believe is very limited in certain aspects... – Paul S-Pou Jun 26 '21 at 00:30
  • @PaulS-Pou - oooh. I didn't realize how my comment could be ambiguous. Your question here is of much higher standard than many and was refreshing to read. :thumbs-up: My complaint lay with the person that originally wrote Exercise 14-4. Sorry for the confusion! – enhzflep Jun 26 '21 at 00:43

2 Answers2

2
fwrite(in_file, sizeof(char), 1, out_file);

is wrong because an integer is given where a pointer is expected.

You can use fputc to write one byte like

fputc(in_file, out_file);

If you still want to use fwrite for some reason, prepare a data to write and write that like

{
    unsigned char in_file_byte = in_file;
    fwrite(&in_file_byte, sizeof(in_file_byte), 1, out_file);
}

Now the contents of the output file will be the same as the input file. Some system may perform conversion of newline characters and it may make the contents differ because the input file is opened in text mode.

MikeCAT
  • 73,922
  • 11
  • 45
  • 70
  • Thanks for the feedback, I've already tried to replace the `fwrite()` with `fgetc()`, but that makes me doubt more, because now I have the same input and output, even when the output should be different, because ASCII and binary files should not look the same right? – Paul S-Pou Jun 26 '21 at 00:11
  • @PaulS-Pou They should look the same if no conversion is performed. It is just matter of how to interpret the data (bytes). – MikeCAT Jun 26 '21 at 00:13
1

Opening a file in text mode or binary mode has nothing to do with ASCII/binary conversion. It has to do with how the operating system deals with some special characters (such as new line characters), line size limit or file extensions.

In the fopen Linux man page:

The mode string can also include the letter 'b' either as a last character or as a character between the characters in any of the two-character strings de‐ scribed above. This is strictly for compatibility with C89 and has no effect; the 'b' is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the 'b' may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-UNIX environments.)

For more information about opening a file in text or binary mode, see https://stackoverflow.com/a/20863975/6874310

Now, back to the ASCII conversion:

All the data in a computer is stored in bits so in the end everything is binary.

A text file containing ASCII characters is also a binary file, except its contents can be mapped to the ASCII table characters in a meaningful way.

Have a look at the ASCII table. The ASCII character number zero (0) has a binary value of 0x30. This means that the zero you see in a text file is actually a binary number 0x30 in the memory.

Your program is reading data from a file and writing to another file without performing any ASCII/binary conversion.

Also, there is a small error here:

fwrite(in_file, sizeof(char), 1, out_file);

It probably should be:

fwrite(&ch, sizeof(char), 1, out_file);

This writes the byte in variable ch to out_file. With this fix, the program basically reads data from the file bigfile.txt and write the very same data to the file out_file.bin without any conversion.

To convert a single digit ASCII number to binary, read the digit from your input file in a byte (char type) and subtract 0x30 from it:

char ch = fgetc(in_file);

if(ch == EOF)
{
    break;
}
else if (isdigit(ch))
{
   ch = ch - 0x30;
   fwrite(&ch, sizeof(char), 1, out_file);
}

Now, your output file will be actually binary. Use isdigitto ensure the byte is an ASCII number. Add #include <ctype.h> in the beginning of your file to use it.

So, for a small input file with the following text:

123

Its binary representation will be:

0x313233

And, after the ASCII numbers are converted to binary, the binary contents will be:

0x010203

To convert it back to ASCII, simply reverse the conversion. That is, add 0x30 to each byte of the binary file.

If you're using a Unix-like system, you can use command line tools such as xxd to check binary files. On Windows, any Hex Editor program will do the job.

Jardel Lucca
  • 1,115
  • 3
  • 10
  • 19
  • 1
    Thanks for the detailed explanation, it solved every doubt that I had in the beginning, I don't know why I imagined that using the b flag on the `fopen()` would do the magic. but this solves all my doubts, really appreciate this. – Paul S-Pou Jun 27 '21 at 14:29
  • Cool! I remember I was confused when I was learning this for the first time. Some tutorials are pretty vague when explaining these flags. – Jardel Lucca Jun 27 '21 at 17:15