2

I need to create some binary data files from Object Identifiers, which is a variable-length binary void* buffer, upto 64 bytes and can contain any bytes corresponding to non-printable characters as well. I can't use Object Identifier as my file name as it contains non-printable letters. Any suggestions to create the unique filename. How can the UUID be derived or used in this case ?

user503403
  • 259
  • 1
  • 2
  • 14
  • 1
    see http://en.wikipedia.org/wiki/Base64 and http://stackoverflow.com/questions/342409/how-do-i-base64-encode-decode-in-c – Jim Balter Jun 27 '13 at 02:28

1 Answers1

4

You can convert the bytes into a hexadecimal string.

#define tohex(x) ("0123456789abcdef"[(x) & 0x0f])
char buf[129];
assert(objid_len <= 64);
for (int i = 0; i < objid_len; ++i) {
    buf[2*i] = tohex(objid[i] >> 4);
    buf[2*i+1] = tohex(objid[i]);
}
buf[2*objid_len] = '\0';

You can make the filenames have universal length by using a padding character that is outside the alphabet used to to represent the object id. If a shorter filename is desired, then a higher base could be used. For example, Base64.

const char * const base64str =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
#define tob64(x) base64str[(x) & 0x3f]

void objid_to_filename (const unsigned char *objid, int objid_len,
                        char *buf) {
    memset(buf, '-', 88);
    buf[88] = '\0';
    int i = 0, j = 0;
    int buflen = 4 * ((objid_len + 2)/3);
    while (i < objid_len) {
        unsigned x = 0;
        x |= (i < objid_len) ? objid[i++] << 16 : 0;
        x |= (i < objid_len) ? objid[i++] <<  8 : 0;
        x |= (i < objid_len) ? objid[i++] <<  0 : 0;
        buf[j++] = tob64(x >> 18);
        buf[j++] = tob64(x >> 12);
        buf[j++] = tob64(x >>  6);
        buf[j++] = tob64(x >>  0);
    }
    int pad = (3 - (objid_len % 3)) % 3;
    for (i = 0; i < pad; ++i) buf[buflen - 1 - i] = '=';
}
jxh
  • 69,070
  • 8
  • 110
  • 193
  • And you might decide to use some grouping every 4-8 hex digits to help people with the name. – Jonathan Leffler Jun 27 '13 at 02:19
  • With the bytes to hex string conversion I will never get the unique file name. As my Object Id is a variable length buffer but for file name I need to define a size. Just an example suppose 8 as file size, then in that case 2 differ ObjectId can result to same file name. – user503403 Jun 27 '13 at 03:32
  • @user503403: I don't understand your issue. The ID is the file name and the length of the name is twice the length of the ID. Why doesn't this work for you? – jxh Jun 27 '13 at 04:59
  • My ID is variable length , which will result in variable length file name. Which is not a good way to represent files on the disc. The other issue is length can go to 64 bytes means a file name of 128 , which is not a good option . – user503403 Jun 27 '13 at 06:02
  • @user503403: You can padthe name with leading `0`s or `-`s or some other printable non hex character to fill out 128 bytes. If you don't like long file names, you can use a larger base than 16 to represent the name. – jxh Jun 27 '13 at 06:09
  • @user503403: The Base64 encoding uses `/` as one of the encoding characters, which is not a good choice if you are using it for a file name. You can change it to something more filename friendly though (perhaps an underscore character, `_`). – jxh Jun 27 '13 at 16:46