-1

When trying to convert binary to hexadecimal, I get wrong results in JavaScript and C++.

This is my PHP code:

$f = bin2hex("l¬");
echo $f;

The output is

6cc2ac

In JavaScript, I use this function:

function bin2hex(s){
var i,f =s.length, a =[];
for(i =0;i<f;i++){
a[i] = s.charCodeAt(i).toString(16);
}
 return a.join('');
}

The output is

6cac

And this is the C++ code:

std::string bin2hex(const std::string& s)
{
  const static char bin2hex_lookup[] = "0123456789abcdef";
  unsigned int t=0,i=0,leng=s.length();
  std::stringstream r;
  for(i=0; i<leng; i++)
  {
    r << bin2hex_lookup[ s[i] >> 4 ];
    r << bin2hex_lookup[ s[i] & 0x0f ];
  }
  return r.str();
}

Calling the function with

cout << bin2hex("l¬") << endl;

prints

6c c

What is the problem with the JavaScript and the C++ version? Why do they yield different results?

cadaniluk
  • 15,027
  • 2
  • 39
  • 67
  • 1
    In the C++ version the surprising result is because `s[i] >> 4` doesn't do what you expect when `s[i]` is greater than 127. You should have used `(unsigned char)(s[i]) >> 4`. The rest of the difference seems to be whether that second character is 8 bits wide or 16. I don't know how you created that character, so I can't say which of PHP or JavaScript is wrong. – JSF Oct 19 '15 at 12:54
  • @JSF that second character is 8 bits wide or 16. I don't know how you created that character i read wav file – Максим Зубков Oct 19 '15 at 13:09
  • @JSF now c++ give me 6cac its like javascript after use your code – Максим Зубков Oct 19 '15 at 13:11

2 Answers2

1

The hex value will depend on the encoding of said string. PHP is assuming it's UTF-8. ES defines strings as UTF-16:

primitive value that is a finite ordered sequence of zero or more 16-bit unsigned integer

NOTE A String value is a member of the String type. Each integer value in the sequence usually represents a single 16-bit unit of UTF-16 text. However, ECMAScript does not place any restrictions or requirements on the values except that they must be 16-bit unsigned integers.

So it can work with UTF-16 (UCS-2 is also 16 bit but doesn't allow the use of surrogates to access the other planes).

Anyway, UTF-16 Hex representation for ¬ is 00AC. Which is why you get AC. I don't know about C++, but my guess would be that each character is also treated as UTF-16 (or UCS2).

Community
  • 1
  • 1
MinusFour
  • 13,913
  • 3
  • 30
  • 39
  • so what the solution in javaScript ? now ¬ must be c2ac not ac – Максим Зубков Oct 19 '15 at 13:44
  • If C++ and JS are getting you UTF-16, wouldn't it be better if PHP treated it as UTF-16 instead of UTF-8? – MinusFour Oct 19 '15 at 13:50
  • I have project in php for audio waveform and its work good if i use UTF-16 maby give me bad waveform i just begin use node.js and i want to write it on node.js or on c++ and run it from node.js – Максим Зубков Oct 19 '15 at 13:54
  • Well then, you'll need to find some sort of conversion algorithm that takes UTF-16 into UTF8. – MinusFour Oct 19 '15 at 14:01
  • LOL i found this function ^__^ function toHex(str,hex){ try{ hex = unescape(encodeURIComponent(str)) .split('').map(function(v){ return v.charCodeAt(0).toString(16) }).join('') } catch(e){ hex = str console.log('invalid text input: ' + str) } return hex } on this question http://stackoverflow.com/questions/21647928/javascript-unicode-string-to-hex – Максим Зубков Oct 19 '15 at 14:15
1

this is a converter FROM hexadecimal to integer I wrote, to convert a hexadecimal string to an integer, you really just have to do the opposite, more or less. I can write a converter from integer to hex string, if you want:)

long HexaDigitToDecimalDigit(char ch)
{

    switch(ch) {
        case '0': return 0;  break;     case '1': return 1;  break;
        case '2': return 2;  break;     case '3': return 3;  break;
        case '4': return 4;  break;     case '5': return 5;  break;
        case '6': return 6;  break;     case    '7': return 7;  break;
        case '8': return 8;  break;     case '9': return 9;  break;
        case 'A': return 10;  break     case 'B': return 11;  break
        case 'C': return 12;  break
        case 'D': return 13;  break     case 'E': return 14;  break
        case 'F': return 15;  break
        default:return 0;   }

}

// Hexstrings are normal /0 terminated strings
long HexToDec(char* pchHexStr) 
{
    long lTemp = 0;
    long lMultiPlier = 1;

    int i = 0;

    while (pchHexStr[i] != '\0')
    {
        lTemp += HexaDigitToDecimalDigit(pchHexStr[i]) * lMultiPlier;
        i++;
        lMultiPlier *= 16;
    }

    return lTemp;
}
phazer
  • 89
  • 7