0

I have a String variable that I want to convert to a long variable.

The problem is that the String variable will never contain any numbers, so simply calling Long.parseLong(myString); will throw a NumberFormatException.

To clarify my intentions:

I have a method that returns a long from a String in-parameter. I want the method to generate an ID based on the String variable, to later be able to group the long values.

I might solve this using a RegEx expression, but my question is if there's any straight forward way to get a long value of a String?

Marcus
  • 6,697
  • 11
  • 46
  • 89
  • 2
    You need a hashing algorithm. – James M Apr 05 '15 at 19:36
  • check this http://stackoverflow.com/questions/2624192/good-hash-function-for-strings – Mzf Apr 05 '15 at 19:37
  • I see. You wouldn't happen to sit on one of those, would you? Nothing fancy required :-) @JamesMcLaughlin – Marcus Apr 05 '15 at 19:38
  • 3
    @Marcus `myString.hashCode()` – Ismail Badawi Apr 05 '15 at 19:39
  • Simple as that, I feel stupid now :-) @IsmailBadawi – Marcus Apr 05 '15 at 19:40
  • Although, `hashCode()` will not return the same long for the same String variable. I.e., if I send in `abc` as a parameter, I will get different `long` values each time. I need them to be the the same. Is that possible? @IsmailBadawi – Marcus Apr 05 '15 at 19:43
  • 1
    It should be the same for all strings that have the same value (see [this question](http://stackoverflow.com/questions/785091/consistency-of-hashcode-on-a-java-string)). – Ismail Badawi Apr 05 '15 at 19:46
  • You are correct, I made an error while testing. It works. @IsmailBadawi – Marcus Apr 05 '15 at 19:48
  • 3
    `hashCode()` returns `int`, not `long`. Though it is assignable, you'll only get a fraction of the values you could get by writing your own hash algorithm, and you'll get many more collisions (strings that end up with the same value). – RealSkeptic Apr 05 '15 at 19:58
  • 2
    hashCode won't work because of collisisions (different strings can return same hash code) – Alex Salauyou Apr 05 '15 at 20:03
  • I hope that you don't expect to get a unique `long` value for each `String` content, because that's not possible - there are more possible strings than there are `long` values, so there will always be different strings that would have the same hash code - see the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle). – Jesper Apr 05 '15 at 20:28
  • @SashaSalauyou Show a way of producing unique longs from arbitrary strings longer than 4 characters. Except by keeping them in a map. – RealSkeptic Apr 05 '15 at 21:41

2 Answers2

2

You say you want a long value. The built in hashCode() returns an int, not a long. If you really do need a long then you need to use a hashing method that returns a long. There are a number of possibilities, though I usually suggest the FNV hash for non-cryptographic purposes. It is very easy to code and comes in a wide range of sizes, 64-bit included.

ETA: Code for the FNV hash is on the FNV website that I linked to. Things to be careful of are 1) unsigned v. signed 64-bit numbers and 2) character encodings.

long FNV64Hash(String inString) throws UnsupportedEncodingException {
    // FNV-64 constants.
    long FNVprime = 1099511628211L;
    
    // Needs workround for unsigned 64-bit: 14695981039346656037.
    long FNVbasis = (146959810393466560L * 100L) + 37L;
    // Alternative: long FNVbasis = -3750763034362895579L;
  
    // Convert string to bytes.
    byte[] bytes = inString.getBytes("UTF-8"); // Specify a character encoding.
    
    long hash = FNVbasis;
    for (byte aByte : bytes) {
        hash ^= aByte;
        hash *= FNVprime;
    }
    return hash;
} // end FNV64Hash()
rossum
  • 15,344
  • 1
  • 24
  • 38
-1

if you want a simple and easy way , you can use hashCode() in java , and here is an example

import java.io.*;

public class StringHashing{
   public static void main(String args[]){
      String Str = new String("HELLO WORLD !!");
      System.out.println("Hashcode for Str :" + Str.hashCode() );
   }
}

or you can implement your own hash function

Moataz Shawky
  • 79
  • 1
  • 3
  • Only problem is that `String.hashCode()` returns an **int**, not a **long**. Of course an `int` is assignable to a `long`. But you only get half the values that you should, and you have more collisions (same result for different strings). – RealSkeptic Apr 05 '15 at 19:56
  • hash code doesn't solve ID problem because of collisions – Alex Salauyou Apr 05 '15 at 20:01
  • @RealSkeptic You don't even get half the possible values. 2^32 = sqrt(2^64), not (2^42/2). – nanofarad Apr 05 '15 at 21:31
  • 1
    @SashaSalauyou Collisions are guaranteed for any string longer than 64 bits. There's no way to prevent them, with *any* algorithm that returns a long. – nanofarad Apr 05 '15 at 21:32
  • @hexafraction Right, in my comment to the OP I was more accurate: you get a fraction of the values. But then, what's that 42 doing in your comment? :-) – RealSkeptic Apr 05 '15 at 21:38
  • Apparently I cannot type, it seems. Should be 2^64 – nanofarad Apr 05 '15 at 21:38
  • @hexafraction not exactly right. Collisions are guaranteed for any algorithm only if overall population exceeds 2^64. Naive approach: Map where you put new incremented value when a new key arrives. This is very bad and resource consuming approach, but the only one that guarantees that one string corresponds with single long and vice versa. – Alex Salauyou Apr 06 '15 at 08:16