4

I have data stored in an SQLite database as BINARY(16), the value of which is determined by PHP's hex2bin function on a 32-character hexadecimal string.

As an example, the string 434e405b823445c09cb6c359fb1b7918 returns CN@[4EÀ¶ÃYûy.

The data stored in this database needs to be manipulated by JavaScript, and to do so I've used the following function (adapted from Andris's answer here):

// Convert hexadecimal to binary string
String.prototype.hex2bin = function ()
{

    // Define the variables
    var i = 0, l = this.length - 1, bytes = []

    // Iterate over the nibbles and convert to binary string
    for (i; i < l; i += 2)
    {
        bytes.push(parseInt(this.substr(i, 2), 16))
    }

    // Return the binary string
    return String.fromCharCode.apply(String, bytes)

}

This works as expected, returning CN@[4EÀ¶ÃYûy from 434e405b823445c09cb6c359fb1b7918.

The problem I have, however, is that when dealing directly with the data returned by PHP's hex2bin function I am given the string CN@[�4E����Y�y rather than CN@[4EÀ¶ÃYûy. This is making it impossible for me to work between the two (for context, JavaScript is being used to power an offline iPad app that works with data retrieved from a PHP web app) as I need to be able to use JavaScript to generate a 32-character hexadecimal string, convert it to a binary string, and have it work with PHP's hex2bin function (and SQLite's HEX function).

This issue, I believe, is that JavaScript uses UTF-16 whereas the binary string is stored as utf8_unicode_ci. My initial thought, then, was that I need to convert the string to UTF-8. Using a Google search led me to here and searching StackOverflow led me to bobince's answer here, both of which recommend using unescape(encodeURIComponent(str)). However, this does return what I need (CN@[�4E����Y�y):

// CN@[Â4EöÃYûy
unescape(encodeURIComponent('434e405b823445c09cb6c359fb1b7918'.hex2bin()))

My question, then, is:

How can I use JavaScript to convert a hexadecimal string into a UTF-8 binary string?

Community
  • 1
  • 1
Michael
  • 11,912
  • 6
  • 49
  • 64
  • "*when dealing directly with the data returned by PHP's `hex2bin` function I am given the string `CN@[�4E����Y�y` rather than `CN@[4EÀ¶ÃYûy`*" - that sounds more like an encoding problem. How do you pass the data from PHP to JS? – Bergi Mar 18 '13 at 16:27
  • It's pulled out of an SQLite database. – Michael Mar 18 '13 at 16:38
  • If that data is representing characters, why is it `BINARY(16)`? If the data is not representing characters, why are you trying to convert it to characters? – Esailija Mar 18 '13 at 18:40
  • I'm storing a UUID as `BINARY(16)` in the database. This needs to be generated, retrieved, and used by both `PHP` and `JavaScript`. I'm using `SQLite`'s `HEX` function to get the hexadecimal string when selecting the `UUID`. The problem is when using `HEX(uuid)` on a UUID generated by the `hex2bin` `JavaScript` function above it differs from the actual hexadecimal string that was generated (whereas `PHP`'s `hex2bin` function works fine). – Michael Mar 18 '13 at 18:49
  • @MichaelRushton It would be much simpler to use `CHAR(32)` rather than `BINARY(16)`, you would already have hex digits everywhere and wouldn't have to worry about different conversions – Esailija Mar 18 '13 at 18:52
  • http://stackoverflow.com/a/10951183/215384 – Michael Mar 18 '13 at 18:54
  • Still, why cannot you just convert to binary just-in-time when you are inserting and convert to hex as soon as you are querying? In fact, that's exactly what is advised in the answer you linked... – Esailija Mar 18 '13 at 18:59
  • I'm converting to hex when retrieving (using `HEX`). Unfortunately, `SQLite` doesn't have an `UNHEX` function to do the reverse (unlike `MySQL`, which does). – Michael Mar 18 '13 at 19:02
  • 1
    @MichaelRushton but it supports blob literals.. so all you need is `X` in front of a quoted hex string. Replace `UNHEX('434e405b823445c09cb6c359fb1b7918')` with `X'434e405b823445c09cb6c359fb1b7918'` – Esailija Mar 18 '13 at 19:05
  • Huh. That is exactly what I need. Means I don't have to use `JavaScript` to generate the binary string at all. Thank you very much. – Michael Mar 18 '13 at 19:15

3 Answers3

2

Given a hex-encoded UTF-8 string, `hex',

hex.replace(/../g, '%$&')

will produce a URI-encoded UTF-8 string.

decodeURIComponent converts URI-encoded UTF-8 sequences into JavaScript UTF-16 encoded strings, so

decodeURIComponent(hex.replace(/../g, '%$&'))

should decode a properly hex-encoded UTF-8 string.

You can see that it works by applying it to the example from the hex2bin documentation.

alert(decodeURIComponent('6578616d706c65206865782064617461'.replace(/../g, '%$&')));
// alerts "example hex data"

The string you gave is not UTF-8 encoded though. Specifically,

434e405b823445c09cb6c359fb1b7918
        ^

82 must follow a byte with at least the first two bits set, and 5b is not such a byte.

RFC 2279 explains:

The table below summarizes the format of these different octet types. The letter x indicates bits available for encoding bits of the UCS-4 character value.

UCS-4 range (hex.)           UTF-8 octet sequence (binary)
0000 0000-0000 007F   0xxxxxxx
0000 0080-0000 07FF   110xxxxx 10xxxxxx
0000 0800-0000 FFFF   1110xxxx 10xxxxxx 10xxxxxx
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
  • The problem is in *generating* that UTF-8 byte string (in `JavaScript`) rather than *retrieving* it (from `PHP`/`SQLite`). Sorry if that wasn't clear. – Michael Mar 18 '13 at 17:18
  • @MichaelRushton, I'm still unclear. Given a regular (UTF-16 encoded) JS string, you want to get the hex-encoded string, or get a UTF-8 octet string? – Mike Samuel Mar 18 '13 at 17:56
  • I need to store a hexadecimal string as `BINARY`. This binary string needs to be generated by both `PHP` and `JavaScript`. The problem is that the binary string returned by `PHP`'s `hex2bin` function is different to the binary string returned by the custom `JavaScript` `hex2bin` function above, despite the hexadecimal string being the same. I know that the `JavaScript` function is correct as the visual characters it returns are the same as those in the database -- it's just that when pulled out of the database the browser shows *different* characters (presumably as its a different encoding). – Michael Mar 18 '13 at 18:37
  • @MichaelRushton, please see my rewrite. – Mike Samuel Mar 18 '13 at 18:51
  • The hexadecimal string was generated by the code in this answer: http://stackoverflow.com/a/2117523/215384 – Michael Mar 18 '13 at 18:58
  • @MichaelRushton, that answer produces random byte-sequences. Where does UTF-8 enter into this? – Mike Samuel Mar 18 '13 at 20:04
  • It was my *assumption* that `JavaScript` using `UTF-16` encoding and my `PHP` app using `UTF-8` encoding was the reason behind the inconsistency between the output of the `hex2bin` functions on the same hexadecimal string. I thought that if I could get `JavaScript` to use `UTF-8` then the binary string results would match. – Michael Mar 18 '13 at 20:46
1

Your applications don't have to handle binary at any point. Insertion is latest possible point and that's where you convert to binary at last. Selection is earliest possible point and that's where you convert to hex, and use hex-strings in application throughout.

When inserting, you can replace UNHEX with blob literals:

INSERT INTO table (id)
VALUES (X'434e405b823445c09cb6c359fb1b7918')

When selection, you can HEX:

SELECT HEX(id) FROM table
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • Perfect. +1 and accepted. It might not answer the question as worded but it does solve the problem as explained. – Michael Mar 18 '13 at 19:25
  • @MichaelRushton it's not really a small task to explain the difference between PHP "strings" and strings in pretty much any other language like Javascript so I am glad to have avoided it :P – Esailija Mar 18 '13 at 19:28
  • One follow up question: can this work with prepared statements? I've tried `(x?)` but that results in an error (as I was half expecting). – Michael Mar 19 '13 at 09:27
  • Also tried concatenating `(x || ?)` and that doesn't work either. – Michael Mar 19 '13 at 09:42
  • @MichaelRushton You have to concatenate it directly into the query, it doesn't seem PDO or mysqli support binary data directly in prepared statement binding. – Esailija Mar 19 '13 at 10:43
  • Ah, OK, although it's SQLite using Titanium Studio's Database module -- but I guess it's the same (the PHP web app is fine as I can bind the result from `hex2bin`). – Michael Mar 19 '13 at 11:26
0

Expanding on Mike's answer, here's some code for encoding and decoding.

Note that the escape/unescape() functions are deprecated. If you need polyfills for them, you can check out the more comprehensive UTF-8 encoding example found here: http://jsfiddle.net/47zwb41o

// UTF-8 to hex
var utf8ToHex = function( s ){
    s = unescape( encodeURIComponent( s ) );
    var chr, i = 0, l = s.length, out = '';
    for( ; i < l; i++ ){
        chr = s.charCodeAt( i ).toString( 16 );
        out += ( chr.length % 2 == 0 ) ? chr : '0' + chr;
    }
    return out;
};

// Hex to UTF-8
var hexToUtf8 = function( s ){
    return decodeURIComponent( s.replace( /../g, '%$&' ) );
};
Beejor
  • 8,606
  • 1
  • 41
  • 31