2

I would like to know if there is a multi-language library or something that permits to give me the following result:

  • I have a String = "Abcde12345" in Java
  • We will suppose its hashcode in Java is "78911"
  • I Have a String = "Abcde12345" in a C program

What i'd like to know is: how can i easily get the hashcode 78911 in my C program? Since each language can provide its own hash algorithm for a String, how can i handle that?


I'm asking this in the context of using Distributed Hash Tables (datagrids, distributed caches, NoSQL...). I'm planning to create something similar to a very simple client in C for a Java proprietary datagrid.

This is my usecase for now, but for my project, i will need a hash algorithm compatible with multiple languages: - Java hash algorithm in Ruby - C# hash algorithm in Java - C++ hash algorithm in Java - Java hash algorithm in C++ - Java hash algorithm in Erlang In any case, the hash of both algorithms in both languages will need to produce the exact same hash value.

And if possible, i'd like to extend the concept to primitive types and "simple structures" and not just for String


Does anyone know any tool to handle my usecase?


Edit: for Jim Balter

My usecase is:

I have a proprietary partitioning/datagrid technology called GemFire, written in Java. It acts as a distributed hashmap. The number of buckets in the hashmap is fixed. For each map key, it computes its hashcode, and apply a modulo, so that it knows for each key to each bucket it belongs to.

For exemple, if i have 113 bucket (which is the default number of buckets in gemfire), and my map key is the String "Key"

"Key".hashCode() % 113 = 69

Thus GemFire knows "Key" belongs to the 69nth bucket.

Now i have a C application:

  • This application is already aware of the number of buckets used by Gemfire (113).
  • This application needs to be able to compute, for any random key, the bucket number in which GemFire would put that random key.
  • This application needs to be able to compute it fastly, we can't use a webservice.
  • This application should be easy to deploy, and i don't any bridge technology between C/Java - that would require a JVM to be installed to run the C application

So if you know how to do that without having to write/use a Java hashcode port in C, please tell me.

Edit: to avoid confusion: i'm not looking for a anything else, but Jim Balter you suggested i do not need what i claim to need so tell me if you see any other solution, except using like you said a custom or popular hash algorithm.

And in the future i may need to do the same for an Erlang partitionning application with a C# client application, and other languages!


Edit: I would like to avoid using a non-java hash algo (as someone suggested using md5/sha1 or any faster non-security-oriented hash algo). This is because my solution aims to be deployed on legacy distributed systems oftenly written in Java, which already contain a lot of data, and any change in the hash algorithm would require a heavy migration process of the data. However i keep this solution in mind since it could be a sweet second option for people starting a new distributed system from scratch or ready to do their data migration.


So in the end, what i am looking for is not some people to tell me to implement the Java String hash algorithm in C, i already know i can do that thanks! I want to know if someone already did it, and not only for implementing all primitive java algorithms in C, but also in other languages, and from other languages!!! I'm looking for a multi-languages library that provides for each other language, a port of the hash algorithms.

Thus if there would be only 3 languages in earth (C, Java and Python), my question is: is there any polyglot library that provides:

  • A port of Java hash in C
  • A port of Java hash in Python
  • A port of C hash in Java
  • A port of C hash in Python
  • A port of Python hash in Java
  • A port of Python hash in C

For all primitive types available, and eventually basic structures. If for a given language there is no "default hash algorithm" then the most widely used can be considered as the language algorithm.

You see what i mean? I want to know if there is a LIBRARY! i know i can look in the JDK or specification and implement it on my own, but as i'm targeting a large number of languages and i don't know how to code in every languages, i'd like someone to have did it for me and made available in an opensource, free to use project!

Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
  • See http://stackoverflow.com/questions/785091/consistency-of-hashcode-on-a-java-string – Jim Balter Jun 19 '12 at 17:24
  • Wouln't it be easier to just build you own using any strong hash? Otherwise you'll forever be guessing what those "other guys" ever did, or forever inspecting their code. – Ira Baxter Jun 19 '12 at 17:32
  • "how to do that without having to write/use a Java hashcode port in C" -- That makes no sense whatsoever. The way to do that is precisely to implement the trivial Java String hash algorithm in C ... why in the world would you not want to? – Jim Balter Jun 20 '12 at 15:56
  • P.S. If you really want to implement this in multiple languages and want to be able to calculate the hash codes in C, then simply don't use String#HashCode, use some other hash function of your own design and specification, and calculate myHashCode("Key") instead of "Key".hashCode(). You would only need the Java hashCode if storing into Java containers, but it's your own container. – Jim Balter Jun 20 '12 at 16:01
  • @JimBalter i never said i don't want to write/use a java hashcode port in C, it's exactly what i am looking for! an already existing library that provides in C hashcode implementation in Java, and by chance in other languages too! I said that because you suggested i don't exactly need what i claim to need and i wanted you to tell me what else do you have in mind if not a port of java hash algorithm in C... – Sebastien Lorber Jun 20 '12 at 19:51
  • Your comments are very confused/confusing. Good luck. – Jim Balter Jun 21 '12 at 01:37
  • what do you mean, what is confusing in "i want a library". There's nothing confusing. Unless you give me a library name your answer is out of scope – Sebastien Lorber Jun 21 '12 at 08:30
  • You bold text and multiple exclamation marks are "out of scope" and make you look silly. – Jim Balter Jun 23 '12 at 00:03

2 Answers2

1

I would add that you can browse via the source code of OpenJDK and see the hashCode implementation. However, bare in mind that as suggested at the comment suggested by Jim Garrison, different classes may override hashCode, so you will have to follow the implementation. I would suggest for performing hashing of Strings to use well known hash functions, such as sha-1 , or maybe md5 - you can find implementations both at Java , C/C++ and other programming languages.

Yair Zaslavsky
  • 4,091
  • 4
  • 20
  • 27
  • String is final; its methods cannot be overridden. And there is no need to browse the code because the algorithm is *part of the specification*. – Jim Balter Jun 19 '12 at 17:26
  • If security is not an issue, then cryptographic hashes, like SHA-1 or MD5 are overkill, and slow. Something like the FNV hash (http://isthe.com/chongo/tech/comp/fnv/) will be a lot faster and can easily be implemented in whatever languages you require. – rossum Jun 20 '12 at 12:16
  • Thanks, i did not think about these common security algorithms. It's definitly something like that i need, because there should be existing implementations in almost any languages. But i need the algorithm to be fast – Sebastien Lorber Jun 20 '12 at 14:30
  • Have a look at the FNV website, there are a number of different implementations given there. The algorithm itself is not difficult to program from scratch. – rossum Jun 20 '12 at 16:03
  • @Sebastien There is absolutely no need for existing implementations of something as trivial to implement as FNV ... just write your own. – Jim Balter Jun 20 '12 at 16:05
0

The algorithm for calculating the hash code of a Java string is quite simple and is documented as part of the public specification: http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#hashCode()

The hash code for a String object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

Note also that String is a final class so its methods cannot be overridden; thus, you are guaranteed that the given algorithm is correct for any Java String.

For languages other than Java, if the language does not specify the hash algorithm (and Java is unusual in doing so), then you cannot be sure that the hash algorithm won't change, even if you can ascertain it. I suspect that you do not actually need what you claim you need, but you would have to say more about your requirements (as opposed to what you think would address them).

Jim Balter
  • 16,163
  • 3
  • 43
  • 66
  • I know i can reimplement it but i would have liked a port of Java hash algorithms in other languages, not only c. – Sebastien Lorber Jun 20 '12 at 14:07
  • @Sebastien Your comment makes no sense. Implementation of the algorithm in another language = "port" of the algorithm to another language. – Jim Balter Jun 20 '12 at 15:53
  • A map can take keys that are not only Strings. @SebastienLorber gave String as an example. – Yair Zaslavsky Jun 20 '12 at 18:10
  • Er, yes, I know that a map can take keys that are not strings, but that has no bearing on anything here, and String is not merely an example. – Jim Balter Jun 20 '12 at 18:12