2

I am working with a base64 library that I found on GitHub Link to b64.c and it works okay when I am encoding ascii strings but when I try to encode a binary file such as an image, it doesn't work. Below is the code snippet I am using to read in the file.

hello.txt

héllo

hello.txt has the only one special character. It works okay if the special character was just a regular character.

main.c

int main()
{
    FILE *fp=NULL;
    char *buf=NULL, *str1="héllo", *str2="hello";
    int i=0;
    size_t fsize=0, bytes_read=0;

    fp=fopen("hello.txt", "rb");
    fseek(fp, 0, SEEK_END);
    fsize=ftell(fp);
    rewind(fp);
    buf=(char*)malloc(sizeof(char)*(fsize));
    //buf[fsize]='\0';
    bytes_read=fread(buf, 1, fsize, fp);
    if( bytes_read!=fsize ) exit(-1);
    fclose(fp);
    printf("encoded=%s\n", b64_encode((const unsigned char*)buf, fsize));

    getchar();
    return 0;
}

encode.c // has the function base64_encode

char *b64_encode(const unsigned char* src, size_t len)
{
    int i=0, j=0;
    char *enc=NULL;
    size_t size=0;
    unsigned char buf[4], tmp[3];

    // alloc
    enc=(char*)malloc(0);
    if( enc==NULL )
    {
        perror("enc");

        return NULL;
    }

    while( len-- )
    {
        tmp[i++]=*(src++);
        if( i==3 )
        {
            buf[0]=( tmp[0]&0xfc )>>2;
            buf[1]=( ( tmp[0]&0x03 )<<4 )+( ( tmp[1]&0xf0 )>>4 );
            buf[2]=( ( tmp[1]&0x0f )<<2 )+( ( tmp[1]&0xc0 )>>6 );
            buf[3]=tmp[2]&0x3f;

            /*
             * alloc 4 bytes for 'enc' and then translate
             * each encoded buffer part by index from
             * the base64 table into 'enc' unsigned char array
            */
            enc=(char*)realloc(enc, size+4);
            for( i=0; i<4; ++i )
            {
                enc[size++]=b64_table[buf[i]];
            }

            // reset index
            i=0;
        }
    }

    if( i>0 )
    {
        // fill 'tmp' with '\0' at most 3 times
        for( j=i; j<3; ++j )
        {
            tmp[j]='\0';
        }

        // perform same codes as above
        buf[0]=( tmp[0]&0xfc )>>2;
        buf[1]=( ( tmp[0]&0x03 )<<4 )+( ( tmp[1]&0xf0 )>>4 );
        buf[2]=( ( tmp[1]&0x0f )<<2 )+( ( tmp[1]&0xc0 )>>6 );
        buf[3]=tmp[2]&0x3f;

        // same write to enc with new allocation
        for( j=0; j<i+1; ++j )
        {
            enc=(char*)realloc(enc, size+1);
            enc[size++]=b64_table[buf[j]];
        }

        while( ( i++ )<3 )
        {
            enc=(char*)realloc(enc, size+1);
            enc[size++]='=';
        }
    }

    enc=(char*)realloc(enc, size+1);
    enc[size]='\0';

    return enc;
}

ouput by program

aOnsbG9=

after saving with utf-8

aMPpbGxv  

expected output

aMOpbGxv

PS. I read the file in as binary since it has special characters and because later on I would like to read in binary data such as images or videos.

Hawk
  • 167
  • 2
  • 12
  • Be careful with `malloc(0)`. It's implementation defined if it will return a null pointer or a non-null pointer. – Some programmer dude Dec 21 '16 at 14:20
  • 3
    Can you elaborate the "it doesn't work"-part? – Ctx Dec 21 '16 at 14:21
  • the output that is given doesn't match the expected output.. – Hawk Dec 21 '16 at 14:21
  • I think, it is just an encoding problem... try to save the source file as utf-8 and see, if the problem vanishes – Ctx Dec 21 '16 at 14:23
  • 1
    Have you tried stepping through the code line by line in a debugger, while monitoring variables and their values? Also, please don't post images of text. Copy-paste the actual text as text. – Some programmer dude Dec 21 '16 at 14:24
  • `echo "aOnsbG9=" | recode base64..data | recode latin1..utf8` results in `héìlo` – Ctx Dec 21 '16 at 14:24
  • please explain how I can save the source file as utf-8 on vs2010 professional cause i'm unsure how to do it? – Hawk Dec 21 '16 at 14:25
  • Hm, wait a minute, does the data come from the file `hello.txt`? Then you will have to save _that_ as utf-8. I assumed, that the source is `*str1` before... – Ctx Dec 21 '16 at 14:28
  • @Ctx, when i use an online base64 decoder, it results in `hlo` – Hawk Dec 21 '16 at 14:28
  • Re: VS source file encoding: http://stackoverflow.com/questions/840065/how-to-change-source-file-encoding-in-csharp-project-visual-studio-msbuild-ma – pattivacek Dec 21 '16 at 14:28
  • @Hawk I would recommend Notepad++ for this task. In VS2010 you should be able to call "File->Save File as" and there click on the little arrow on the "Save"-Button and select "Save with Encoding" – Simon Kraemer Dec 21 '16 at 14:28
  • @SimonKraemer, i saved the `.txt` file using notepad++.. – Hawk Dec 21 '16 at 14:30
  • With the correct encoding? You can change it in the "Encoding" menu – Simon Kraemer Dec 21 '16 at 14:33
  • @Hawk But did you make sure that it is saved as utf-8? – Ctx Dec 21 '16 at 14:33
  • @Ctx there seems to be an issue there, _héìlo_, UTF-8 encoded, is `aMOpw6xsbw==` in base64. – Bart van Nierop Dec 21 '16 at 14:33
  • @BartvanNierop Yes it is... I am not sure if I understand what you want to express? – Ctx Dec 21 '16 at 14:34
  • @Ctx, saved it as `Unicode(utf-8 without signature) -Codepage 65001`, all files just saved as that..got a new output. – Hawk Dec 21 '16 at 14:34
  • the output still differs from expected output..and now when i decode online, i get `hllo` so only the special char missing – Hawk Dec 21 '16 at 14:36
  • @Hawk My suspicion is still, that the source file is not properly utf-8 encoded... the `é` should be encoded as `c3 a9`, while it seems to be encoded as `c3 e9`. Please make sure, that `hello.txt` is properly encoded (for example with a hexeditor) – Ctx Dec 21 '16 at 14:42
  • @Ctx, even when i try to hardcode the non-ascii string into the program, it still doesn't output the correct encoding. When i use a different encoder and the same file, it works.. – Hawk Dec 21 '16 at 14:44
  • Can you upload the file hello.txt somewhere? – Ctx Dec 21 '16 at 14:45
  • http://m.uploadedit.com/ba3s/1482332062143.txt link to uploaded file – Hawk Dec 21 '16 at 14:55
  • `aMPpbGxv` is *NOT* UTF-8. – Sam Varshavchik Dec 21 '16 at 15:09
  • @Hawk The file seems to be valid utf-8... I honestly do not know where this breaks then, assumed, that the file you uploaded is guaranteed unmodified the file you use. One last guess: try `fopen("hello.txt", "rbB");`and see if that changes something – Ctx Dec 21 '16 at 15:15
  • i get Invalid FileOpen mode, some runtime error from vs.. – Hawk Dec 21 '16 at 15:22
  • @SamVarshavchik, that is the encoded format of the file i am encoding. It doesn't encode the special characters located within – Hawk Dec 21 '16 at 15:23
  • Well, whatever encoded format that is, it's not UTF-8. `héllo` in UTF-8 and base64-encoded is `aMOpbGxv`. There's no issue with the shown C++ code. You have an issue with properly encoding the file, which is outside the scope of this question. This has nothing to do with C++. – Sam Varshavchik Dec 21 '16 at 15:26
  • @SamVarshavchik, when i use a different C library and the same file, it prints the expected output so I highly doubt the issue is in the file.. – Hawk Dec 21 '16 at 15:27
  • Another guess: Maybe a typo in the `b64_table`? Can you show this table, too? – Ctx Dec 21 '16 at 15:32

1 Answers1

3

The problem lies in the function b64_encode:

buf[2]=( ( tmp[1]&0x0f )<<2 )+( ( tmp[1]&0xc0 )>>6 );

should be

buf[2]=( ( tmp[1]&0x0f )<<2 )+( ( tmp[2]&0xc0 )>>6 );

Be sure to fix this at both ocurrences.

Ctx
  • 18,090
  • 24
  • 36
  • 51
  • damn..thanks a lot..finally..all this time i was stressing out..appreciate the time you took to look through it.. – Hawk Dec 21 '16 at 16:45