C Generate a header file via a Binary file

Question

I'm trying to make a simple encryption type of stuff. So what I wanna do is read an executable's contents, encrypt it and generate a header file which will contain a variable with the encrypted bytes/binaries, then it will decrypt it etc. So the question is how can I export that encrypted stuff onto a header file. Because for example if you try to print a byte representation of the contents you can do it with

printf("%x", byte);

But I don't think that you can use that kind of format to store the bytes in an unsigned char, since the usual format is

unsigned char bytes[] = {0x010, 0x038, 0x340 etc...}

In Python I can do it, but I can't seem to figure it out how to do it directly in C.

If you have recommendations of sources, please share them.

I'm trying to focus on Windows Executables at the moment, most likely I'll try to execute the binary code on a Virtually Allocated Memory, I've seen some code that does it, so I wanna try doing it myself.

To re-phrase and simplify it: You have binary data in your C program which you want to write out in the format of a C header file? — reichhart, Jul 04 '20 at 16:28
I read binary data from an executable file with the standard IO (fopen, fread) onto an unsigned char variable, I wanna encrypt then output that variable onto a C header format, so that I can access it on another C program and decrypt it. — Селятин Исмет, Jul 04 '20 at 16:41
OK, I assume reading the data is clear then it is just about writing e.g. `{ 0xFF, 0xFF, ... }'? — reichhart, Jul 04 '20 at 16:45

vmt · Answer 1 · 2020-07-04T16:42:44.957

Quick and dirty, unsafe and untested. Reads the file defined in INPUT_FILE, and outputs it to OUTPUT_FILE in the format of: unsigned char var[] = { 0xXX, 0xXX ... }; The name of the variable is controlled by VARIABLE_NAME. You should add your own sanity checks, i.e. check the returns from fopen() and the likes.

#include <stdio.h>
#include <stdlib.h>

#define INPUT_FILE "file.exe"
#define OUTPUT_FILE "out.txt"
#define VARIABLE_NAME "bytes"

int main(int argc, char *argv[]) {
    FILE *fp = fopen(INPUT_FILE, "rb");

    // Get file size
    fseek(fp, 0, SEEK_END);
    long size = ftell(fp);
    fseek(fp, 0, SEEK_SET);

    // Alloc, read
    unsigned char *buf = malloc(size);
    fread(buf, size, 1, fp);
    fclose(fp);

    // Write the data out
    fp = fopen(OUTPUT_FILE, "wb");
    fprintf(fp, "unsigned char %s[] = { ", VARIABLE_NAME);
    for (long i = 0; i < size; i++) {
        fprintf(fp, "0x%02x%s", buf[i], (i == size-1) ? " };" : ", ");
    }
    fclose(fp);
    free(buf);
    return 0;
}

bruno · Accepted Answer · 2020-07-04T17:03:55.920

2

Do you want something like that :

#include <stdio.h>

int encode(int c)
{
  return (unsigned char) (c ^ 0xf);
}

int main(int argc, char ** argv)
{
  if (argc != 3) {
    fprintf(stderr, "usage: %s <file in> <file out>\n", *argv);
  }
  else {
    FILE * fpin;
    FILE * fpout;
    
    if ((fpin = fopen(argv[1], "rb")) == NULL) /* under Windows 'b' is necessary to read binary */
      perror("cannot open inpout file");
    else if ((fpout = fopen(argv[2], "w")) == NULL)
      perror("cannot open inpout file");
    else {
      const char * sep = "unsigned char bytes[] = {";
      int c;
     
      while ((c = fgetc(fpin)) != EOF) {
        fprintf(fpout, "%s0x%x", sep, encode(c));
        sep = ", ";
      }
      
      fputs("};\n", fpout);
      fclose(fpin);
      fclose(fpout);
    }
  }
  
  return 0;
}

of course modifying encode

Compilation and execution :

pi@raspberrypi:/tmp $ gcc -Wall e.c
pi@raspberrypi:/tmp $ ./a.out ./a.out h
pi@raspberrypi:/tmp $ cat h
unsigned char bytes[] = {0x70, 0x4a, 0x43, 0x49, 0xe, 0xe, 0xe, 0xf ... 0xf, 0xf, 0xf, 0xf, 0xf};
pi@raspberrypi:/tmp $ ls -l h
-rw-r--r-- 1 pi pi 43677 juil.  4 18:44 h

(I cut cat h result to only show its begin and end)

edited Jul 04 '20 at 17:03

answered Jul 04 '20 at 16:40

bruno

32,421
7
25
37

Yes, but I didn't understand how does return c ^ 0xf work, can you elaborate or reference a documentation please. In the encrpytion I think I can do a simple encryption by incrementing the byte by 1 etc or use some algorithm, then decrypt it. But I can't understand how return c ^ 0xf works, I wanna understand the code and how it works. – Селятин Исмет Jul 04 '20 at 16:49
1

@СелятинИсмет `c ^0xf` is just the xor between the byte and 15, I use it as example, I do not know what kind of encoding you want, xor is commutative to decode you can do again `value ^0xf` – bruno Jul 04 '20 at 16:50
1

@СелятинИсмет but if the goal is to have something secret do not use that xor ^^ – bruno Jul 04 '20 at 16:51
1

Beautify header: Use `%02x`. – reichhart Jul 04 '20 at 16:56
1

@reichhart I am not sure the goal is to read the content, I wanted to show the result for the small program `int main() { return 0;}` but even for it the executable size using *gcc* is 7908 so the generated header has 41433 characters ! ^^ – bruno Jul 04 '20 at 16:59
So based on what I've found, XOR is a bitwise operator which returns 1 if a & b differ and 0 if they don't and 0xf is 15 in hex representation, but why do we compare the byte to 15 specifically? Also when using fread in binary mode it reads the data as literally in binary format meaning 1001, 101100100 etc, and that's why we can't directly print it out right? Then when we're transforming the int to a unsigned char are we directly getting the hex representation of it? For example 0xf = 15 therefore if int c is 15 when casted with unsigned char it'll be "f". If I'm wrong please correct me or src – Селятин Исмет Jul 04 '20 at 17:14
1

@СелятинИсмет no it is an arithmetic xor, it works bit per bit, for instance 3^5 is 011 xor 101 so 110 being 6. I chosen 15 (0xf) randomly, is using 0xff that inverse all the bits. no 15 is not character f in ASCII, code for space is 32 (0x20) code for f is 102 (0x66) etc – bruno Jul 04 '20 at 17:19
@bruno So it's an operator that executes an arithmetic operation (I found that it's (a-b)^2 on binary data src: https://stackoverflow.com/questions/21293278/mathematical-arithmetic-representation-of-xor) and basically so if a = 0x11 (17) & b = 0xf (15) a^b = (a - b)^2 = 0x4 (4) is that how it works? And if I don't wanna use an encoding algorithm then I can just do this right fprintf("%s0x%x", sep, (unsigned char) c); – Селятин Исмет Jul 04 '20 at 17:33
1

@СелятинИсмет 0x11/17 = 00010001 and 0xf/15 is 00001111 so the xor is 00011110 so 0x1e so 30. I supposed you wanted to encode because you say *encryption* in the question, so if you do not need `fprintf("%s0x%x", sep, (unsigned char) c); ` is the right way – bruno Jul 04 '20 at 17:40
2

@bruno In this case you should of course avoid to beautify, we don't want to increase the bloat even more. ;-) (I actually never thought about how much the size would be multiplicated by simply "asciifying" data. :-O) – reichhart Jul 04 '20 at 17:40
I don't understand the calculation, if you have any resources that I can read from I would really appreciate if you tell them to me. I really appreciate your help guys, thanks for being patient and the explanations! – Селятин Исмет Jul 04 '20 at 20:54
1

@СелятинИсмет for each bit of the same rank the rule is 0^0=0, 1^0=1, 0^1=1, 1^1=0. The formula `(a-b)^2` so `(a-b)*(a-b)` is only true for **1** bit, not for several so not for two numbers like 15 and 17 – bruno Jul 04 '20 at 21:09
1

@СелятинИсмет in fact `(a-b)*(a-b)` is like `abs(a-b)` I mean a way to replace -1 by 1. If *a* and *b* value 0 or 1 `(a-b)*(a-b)` equals `abs(a-b)` equals `a xor b`. But again this is right only for the numbers 0 and 1, there is no mathematical formula using the operators `+ - * /` computing `xor` for any number – bruno Jul 04 '20 at 21:18
@bruno For example when I printf("%x", 0x4 ^ 0xf) I'm getting "b", why is that? What's behind the scenes that gives me specifically "b". Why is it that if I change one of the parameters I get a different result that's what I'm trying to understand, what's the principle of the value returned – Селятин Исмет Jul 05 '20 at 01:23
@bruno And while decoding if I'm not wrong you've said that you can just do value ^ encodeValue to decode right? – Селятин Исмет Jul 05 '20 at 01:25
@СелятинИсмет 0x4^0xf is in binary 0100^1111 = 1011 so 0xb i hexadecimal. May be you do not know to print with "%x" ask to print in hexadecimal rather than in decimal with "%d" ? – bruno Jul 05 '20 at 06:55
@СелятинИсмет yes for decoding, because xor is cumutative and associative, so `(v1 ^v2) ^v1 == v1 ^ v1 ^ v2 == (v1 ^ v1) ^ v2` and by definition `v1 ^ v1 == 0` whatever v1, and `0 ^ v2 == v2` whatever v2, so finally `v1 ^v2 ^v1 == v2`. That means encoding v2 doing a xor with v1, to decode it is enough to do again xor with v1 on the previous result to get back v2 – bruno Jul 05 '20 at 06:58
1

Thanks I wrote a simple program to understand it and now I get it, you always get the non-encoded value as long as you have the hex that you used to encode – Селятин Исмет Jul 05 '20 at 10:25

C Generate a header file via a Binary file

2 Answers2

Linked