1

There are many other threads talking about binary file corruption, yet they all seem unrelated to my problem somehow.

I have a C program that downloads/reads files.

Because I do not write all the files I get into a file, the function that I use with curl rather stores data into a string. I may be free to write this string into a file, or not, later.

I have a binary file. I put it on a FTP.

If I download it via a ftp client like filezilla, it contains the right things (that is, the same characters I get when I cat my compiled binary) If I use a curl command line to download the file, it contains the right things as well.

If I use my program to download such a file, it will only contain a string like "ELF" followed by 3 unwritable/unreadable characters.

It's important to note that this only happens with binary files. Text files were transferred/read just file. It's also important to know that it seems to be the data that is passed to my function from curl that's already false: if I put a printf of the data in my write function, I see the same ELF + 3 unreadable chars string, therefore the method I use later on to write it into a file isn't in question.

When I use verbose, curl says it's in binary mode yet the binary isn't properly transferred.

Here is what I have so far, works fine with any file that's not binary, will always be garbage otherwise. Thanks in advance:

struct string 
{
  char *ptr;
  size_t len;
};

char *usr_psswd(char *user, char *psswd)
{
    char *usrpsswd;

    usrpsswd = (char *)malloc(strlen(user) + strlen(psswd) + 2);
    int i = 0;
    int j = 0;

    while (user[i])
    {
        usrpsswd[i] = user[i];
        ++i;
    }
    usrpsswd[i++] = ':';
    while (psswd[j])
    {
        usrpsswd[i] = psswd[j];
        ++i;
        ++j;
    }
    usrpsswd[i] = 0;
    return usrpsswd;
}

void init_string(struct string *s) 
{
  s->len = 0;
  s->ptr = malloc(s->len+1);
  if (s->ptr == NULL) 
  {
    fprintf(stderr, "malloc() failed\n");
    exit(EXIT_FAILURE);
  }
  s->ptr[0] = '\0';
}

size_t writefunc(void *ptr, size_t size, size_t nmemb, struct string *s)
{
    size_t new_len = s->len + size*nmemb;
    s->ptr = realloc(s->ptr, new_len+1);
    if (s->ptr == NULL) 
    {
        fprintf(stderr, "realloc() failed\n");
        exit(EXIT_FAILURE);
    }
    memcpy(s->ptr+s->len, ptr, size*nmemb);
    s->ptr[new_len] = '\0';
    s->len = new_len;
    return size*nmemb;
}

char *curl_get(char *addr, t_data *data)
{
  CURL *curl;
  CURLcode res;
  char *rtrn;
  curl = curl_easy_init();
  if(curl) 
  {
    struct string s;
    init_string(&s);
    curl_easy_setopt(curl, CURLOPT_URL, addr);
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writefunc);
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, &s);
    curl_easy_setopt(curl, CURLOPT_PORT, 21);
    curl_easy_setopt(curl, CURLOPT_USERPWD, usr_psswd(data->login, data->password));
    res = curl_easy_perform(curl);
    if(res != CURLE_OK)
    {
        printf("curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
        free(s.ptr);
        curl_easy_cleanup(curl);
        return NULL;
    }
    rtrn = strdup(s.ptr);
    free(s.ptr);
    curl_easy_cleanup(curl);
  }
  return rtrn;
}
Tohkai
  • 93
  • 1
  • 9

2 Answers2

3

Your problem is that you treat binary data as a string.

The strdup function works like any other string functions: It looks for the string terminator to find the end of the source string. And the string terminator '\0' is the byte value 0. So if the binary data contains any zero bytes (very very likely) then that will be seen as the end of the "string".

Easy solution? Just do return s.ptr; But do note that there's no way to find out the length of the data using the returned pointer. So a better solution might be to return s itself (as it contains a pointer to the data as well as the size of it).

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • Thanks for your interest, but, as I said, this happens before strdup: when I display the pointer passed by curl in my write function, it's already "elf+unreadable chars" – Tohkai Nov 13 '19 at 13:40
  • 3
    @Tohkai And how do you *print* the data? Using `printf` and the `%s` format? Then you have the same problem with binary data that can contain a zero byte anywhere inside itself, which `printf` will think is the string terminator. You just can't treat arbitrary binary data as strings, *at all*. – Some programmer dude Nov 13 '19 at 13:42
  • But here's the thing: I thought it was the problem at first as well, but the initial binary file does not contain any ELF+unreadable chars at the start of it, so I don't even think it stops at 0 because these initial first characters are, well, nowhere to be found in the first place EDIT: nvm it does, I must have opened the wrong file – Tohkai Nov 13 '19 at 13:44
  • Alright... I also thought printf didn't stop at '\0' because, in the past, when I wanted to cut strings by putting a 0 where I wanted to cut, it would sometimes display the entire string anyway. – Tohkai Nov 13 '19 at 13:53
  • "0" or '0' do not equal the binary value `0` – ryyker Nov 13 '19 at 13:54
  • @ryyker 0 is '\0'. If you do str[0] = 0 or str[0] = '\0', it does the same thing as stated in the ascii table. I don't think I've said otherwise anywhere – Tohkai Nov 13 '19 at 13:56
  • @Tohkai - Yes, I agree. I was just clarifying your comment, 2 comments ago, _by putting a 0 where I wanted to cut_ sounded like you were inserting a `0` into a string eg. _this is 0 a string_. – ryyker Nov 13 '19 at 13:59
  • Alright, understood, sorry for lacking clarity on this one, I did mean 0 as in '\0' – Tohkai Nov 13 '19 at 14:01
  • 3
    _"I also thought printf didn't stop at '\0' because, in the past, when I wanted to cut strings by putting a 0 where I wanted to cut, it would sometimes display the entire string anyway"_: I don't believe you. – Jabberwocky Nov 13 '19 at 14:04
1

Most of the issues you are seeing are due to using techniques designed for working with strings, but are being applied to binary files.

When writing code that must at some point be used with binary data and file content, it is best to follow a couple of rules

1) Variables used to contain binary data should prefer unsigned char over char. For example:

char *usr_psswd(char *user, char *psswd){...  

Should be written as

unsigned char *usr_psswd(unsigned char *user, size_t lenUser, unsigned char *psswd, size_t lenPsswd){...  

Note: Reason for including array lengths is covered below.

More on the rational of using unsigned char with binary data.

2) Avoid use of string functions such as strdup(), strlen(), etc. They are all written to look for the terminating null byte to indicated the end of a C string. For example: >

usrpsswd = (char *)malloc(strlen(user) + strlen(psswd) + 2);

should be written as:

 usrpsswd = malloc(lenUser + lenPasswd + 1);//No need for null terminator. (+1 for delimiter, per comments)
                                        //usrpasswrd should be unsigned char *
                                        //Casting return of malloc not recommended. in C.  

More on reliable ways to get array lengths in C.

ryyker
  • 22,849
  • 3
  • 43
  • 87
  • 1
    usr_psswd also includes a separator, though. So there would definitely have to be at least + 1 to allocate for ':' – Tohkai Nov 13 '19 at 15:10
  • @Tohkai - That makes sense. Added `+1` to malloc statement. Thanks. – ryyker Nov 13 '19 at 15:18