0

I have a function in Javascript that produces hashes from strings, I have the same in PHP and produces the same value, however in C it doesn't. It has something to do with type I guess but cannot figure out how to solve this. I spend hours on this.


The javascript function is simple, (I think copied from Java):

function getHashCode(s)
{
 var hash=0,c=(typeof s === 'string')?s.length:0,i=0;

 while(i<c) 
  { hash = ((hash<<5)-hash)+s.charCodeAt(i++); }

 return ( hash < 0 )?((hash*-1)+0xFFFFFFFF):hash; /* convert to unsigned int */
}

The C version (mess with some types already however can't get it right):

uint64_t getHashCode( const char* s )
{
 int16_t  iLen  = strlen( s );
 int16_t  i     = 0;
 int32_t  hash  = 0;

 while( i < iLen ) 
  { 
     // hash<<5 = multiply by 32
    hash = ((hash<<5)-hash)+(uint8_t)s[i++]; 
  }

 return (( hash < 0 )?((hash*-1)+0xFFFFFFFF):hash); //convert to unsigned 
}

Example string I have used:

"M:/Mijn Muziek/Various Artists/Revs & ElBee - Tell It To My Heart.mp3"

After adding some debug output after a new hash value is applied (in the while loop), I get this table:

C/C++               JS
---------------         -----------------
77                  77
2445                2445
75842               75842
2351179             2351179
72886654            72886654
-2035480916         -2035480916
1324601154          1324601154
-1887037154         -1887037154
1631390447          1631390447
-966503578          -966503578
103160276           103160276
-1096998635         -1096998635
352780784           352780784
-1948697477         -1948697477
-280079596          4014887700      <- DIFFERENT!
-92532798           -4387500094     | (and all after this)
1426450655          5721417951      |
1270297459          -7319637133     |
724515670            9314450262     \/
985149401           -7604785191
474860476            9064795068
1835772983          -11049128905
1074387657           9664322249
-1053720936         -9643655528
1694389466           10284324058
986466010           -11898435878
515675343            13400577231
-1193933436          -14078835324
1642769264           14527671152
-613760253          -13498662141
-1846698612          15333170572
-1413082042          -14297983930
-855870241           16323998943
-762173577           -17942042761
2142423004           19322292188
1990603716          -19484232764
1579173090           18759042274
1709725566          -19765110914
1461885063           18641754247
-1926203195         -19106072379
417243165            17597112349
49636328        -17130232856
1538726269      18718595453
455874115       -16723995069
1247195722      18427064906
8361750             -17171507434
259214334       17439083518
-554290137      -17734159321
-3124955        17176744229
-96873497       -17276742681
1291888921      18471758105
1393850960      -20080985520
259706916       21734543396
-539020164      -22013856644
470244184       21945080664
1692667927      -24077135849
933098217       22407934697
-1138726268     -22613562748
-940775819      20534060661
900720715       -20574115765
-2142428835     19332407645
-1990784344     -19170653528
-1584772423     19890064057
-1883304743     -19063173927
1747095227      18926964411
-1674622765     -18854491949
-373698054      16806171130
1300262326      -15879606858
1653426493      14538328381

What can I do to get it right in C? Tried to 'increase' the type however gave very different results. Does JS a trick to change it's type depending on the value or something? Can anybody explain what's going on and why it is so different at a certain point (see DIFFERENT! marking in result table above)?

Somebody any idea how to solve this?

Codebeat
  • 6,501
  • 6
  • 57
  • 99

2 Answers2

1

You're getting a strange result in JS and PHP because you are working with IEEE floating point numbers, not integers. The >> operator treats its operand as if it is a signed 32-bit integer, but the subtraction and addition do not. That means that when the value is between 2^31 and 2^32 - 1 on any iteration you get different output, since JavaScript interprets it as an unsigned number instead of a signed number.

You can fix your JavaScript and make it output the same result as C/C++, by using a rightshift of 0 to treat the result as a signed 32 bit integer again:

const str = "M:/Mijn Muziek/Various Artists/Revs & ElBee - Tell It To My Heart.mp3"
const hashCode = getHashCode( str );
console.log( hashCode );

function getHashCode(s)
{
     var hash=0,c=(typeof s === 'string')?s.length:0,i=0;
    
     while(i<c) 
      { hash = (((hash<<5)-hash)+s.charCodeAt(i++)>>>0); } // Added >>> 0
    
     return ( hash < 0 )?((hash*-1)+0xFFFFFFFF):hash; /* convert to unsigned int */
}

To instead get C to act the way Javascript and PHP do you will need to use a larger datatype for hash (int64_t) , but make sure to treat it as an int32_t for just the leftshift operation:

uint64_t getHashCode( const char* s )
{
 int16_t  iLen  = strlen( s );
 int16_t  i     = 0;
 int64_t  hash  = 0;

 while( i < iLen ) 
  { 
     // hash<<5 = multiply by 32
    hash = (((int32_t)hash<<5)-hash)+(uint8_t)s[i++]; 
  }

 return (( hash < 0 )?((hash*-1)+0xFFFFFFFF):hash); //convert to unsigned 
}
Paul
  • 139,544
  • 27
  • 275
  • 264
  • Thanks for the answer, take a look at it later, however the question is about to achieve the same behaviour in C instead of Javapascript. I have the same in PHP, both produce the same so there must be some trick to get it working. – Codebeat Jan 31 '18 at 06:08
  • 1
    @Codebeat I updated my answer with C code that behaves the way the JS and PHP do. – Paul Jan 31 '18 at 06:40
  • Hi, tried this all, the Javascript version works however the C version don't, still the same results. It's a real problem when I need to change the javascript and php versions because it is used everwhere in projects. – Codebeat Jan 31 '18 at 23:25
  • There is also a typo in the javascript version, >>> must be >> – Codebeat Jan 31 '18 at 23:27
  • @Codebeat `>>>` does the same thing as `>>` when its second operand is 0. – Paul Jan 31 '18 at 23:41
  • @Codebeat The C version works for me ( final value of `hash` is `14538328381` ). Did you make both changes to the code? If you are printing it with printf you need to use `%lli`. – Paul Jan 31 '18 at 23:43
  • Okay, your right! Thanks man, really great! Didn't copy paste before but now after copy paste it is working fine. Thanks! – Codebeat Feb 01 '18 at 00:02
0

You have to convert the result of ((hash << 5) - hash) + s.charCodeAt(i++) back to a 32 bit integer. Easiest way to do that is with a bitwise operation, by virtue of the fact that all bitwise operators in JavaScript return a 32 bit integer result:

function getHashCode(s) {
  var hash = 0,
    c = (typeof s === 'string') ? s.length : 0,
    i = 0;

  while (i < c) {
    hash = ((hash << 5) - hash) + s.charCodeAt(i++);
    hash = hash >> 0; // convert to 32 bit integer

    console.log(hash);
  }

  return (hash < 0) ? ((hash * -1) + 0xFFFFFFFF) : hash;
}

getHashCode("M:/Mijn Muziek/Various Artists/Revs & ElBee - Tell It To My Heart.mp3");
Robby Cornelissen
  • 91,784
  • 22
  • 134
  • 156
  • Thanks for the answer, take a look at it later, however the question is about to achieve the same behaviour in C instead of Javapascript. I have the same in PHP, both produce the same so there must be some trick to get it working. – Codebeat Jan 31 '18 at 06:08