6

How can I convert a non-numeric String to an Integer?

I got for instance:

String unique = "FUBAR";

What's a good way to represent the String as an Integer with no collisions e.g. "FUBAR" should always be represented as the same number and shan't collide with any other String. For instance, String a = "A"; should be represented as the Integer 1 and so on, but what is a method that does this (preferrably for all unicode strings, but in my case ASCII values could be sufficient).

Niklas Rosencrantz
  • 25,640
  • 75
  • 229
  • 424
  • 1
    er. this is what character encodings do. Get the bytes of a String, you have a number. – tom Nov 01 '13 at 10:22
  • 1
    What's the goal here? There are any number of ways to convert a string to a number and maintain uniqueness. Since any data is, after all, stored as a series of bits, it's more of a reinterpretation than a conversion. But if you want the result for any string of any length to fit in a single Java `int` value, then you are looking for a hash function, of which there are many. However, there can never be a perfect one guaranteeing no collisions, since there are more possible strings than ints (pigeonhole principle). – Mark Reed Nov 01 '13 at 10:24
  • 1
    I cannot think of a way that would work for *all* unicode strings, no matter how long, and convert them to a single `int`. But if you find a reliable way, come back and name your price: data compression companies are going to love you ;-) – Sergey Kalinichenko Nov 01 '13 at 10:25
  • @MarkReed My goal is to generate unique IDs based on a concatenation of class name and ID, since I got IDs that are unique but only within the type. So if I concatenate the typename and the ID as strings and then convert that to integer I will get an application-wide unique id. – Niklas Rosencrantz Nov 01 '13 at 10:26
  • 2
    Are you looking for http://stackoverflow.com/questions/2624192/good-hash-function-for-strings? – LeeNeverGup Nov 01 '13 at 10:27
  • @909Niklas You won't be able to fit that into an integer. Or long. You have to figure out a better way. Hashes suggested by others will probably work. – Torben Nov 01 '13 at 10:42
  • 1
    By "integer" do you mean a java `int` or do you mean "a whole number of arbitrary length"? – Bohemian Nov 01 '13 at 11:46

6 Answers6

9

This is impossible. Think about it, an Integer can only be 32 bits. So, by the pigeonhole principle, there must exist at least two strings that have the same Integer value no matter what technique you use for conversion. In reality, there are infinite with the same values...

If you're just looking for an efficient mapping, then I suggest that you just use the int returned by hashCode(), which for reference is actually 31 bits.

Steve P.
  • 14,489
  • 8
  • 42
  • 72
  • OK, I test this: `new Integer(Integer.parseInt(""+this.getClass().getName().hashCode()+id))` – Niklas Rosencrantz Nov 01 '13 at 10:32
  • 5
    Downvoted because it is possible. Hexadecimal numbers contain characters and they can be easily converted to 10 base without any collisions. – Torben Nov 01 '13 at 10:34
  • 3
    @909Niklas what?? `int idValue = (this.getClass().getName() + id).hashCode()` – Mark Reed Nov 01 '13 at 10:34
  • 1
    @Torben the question specifies "with no collisions". That's impossible. – Mark Reed Nov 01 '13 at 10:35
  • 2
    @Torben There is no possible way to guarantee no collisions. If you find a way, please tell me (and no one else). – Steve P. Nov 01 '13 at 10:35
  • 1
    There will be no collisions. F in hexadecimal is the only number that maps to 15 in decimal. – Torben Nov 01 '13 at 10:38
  • 1
    @Torben yes, but then you can't fit the value of any string into a Java Integer. OP specified both conditions, which combine into an impossibility. – Mark Reed Nov 01 '13 at 10:39
  • 1
    @MarkReed OP did not require fitting the whole universe to a 32 bit integer. For crying out loud! – Torben Nov 01 '13 at 10:40
  • 1
    @Torben Dude, I think you're misunderstanding the problem. Of course we can uniquely map a 26*2 (for lowercase and caps) letter alphabet uniquely. He's asking how to map `Strings` to unique `Integers`. Think about it. How can you uniquely map infinite possibilities to a restricted space? – Steve P. Nov 01 '13 at 10:41
  • 1
    @Torben 10 base (or any other base) has nothing to do with the initial question. The question was about Integer, which really have 32 bit, while String "FUBAR" has 96 bits. Then, topic starter did not restricted the length of enumerated strings - so the number of possible strings far exceeds the number of possible integers, thus collisions are unavoidable. – Alexei Kaigorodov Nov 01 '13 at 10:41
  • 1
    Sorry, I read questions literally. If you know what OP is asking, edit the question then. – Torben Nov 01 '13 at 10:44
  • 2
    BTW Object.hashCode() is 31-bit. – Peter Lawrey Nov 01 '13 at 10:44
  • 1
    @SteveP. I believe it is used for biased locking (or another purpose in any case) It means that once you get the Object.hashCode() of an object it cannot be used for biased locking. – Peter Lawrey Nov 01 '13 at 10:48
  • What if pigeonhole principle didn't bother you? What if you want to approach different strings with the same integer representations as a group? – GeorgiG May 23 '18 at 12:20
3

You can map Strings to unique IDs using table. There is not way to do this generically.

final Map<String, Integer> map = new HashMap<>();
public int idFor(String s) {
    Integer id = map.get(s);
    if (id == null)
       map.put(s, id = map.size());
    return id;
}

Note: having unique id's doesn't guarantee no collisions in a hash collection.

http://vanillajava.blogspot.co.uk/2013/10/unique-hashcodes-is-not-enough-to-avoid.html

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
2

If you know the character set used in your strings, then you can think of the string as number with base other than 10. For example, hexadecimal numbers contain letters from A to F.

Therefore, if you know that your strings only contain letters from an 8-bit character set, you can treat the string as a 256-base number. In pseudo code this would be:

number n;
for each letter in string
    n = 256 * n + (letter's position in character set)

If your character set contains 65535 characters, then just multiply 'n' with that number on each step. But beware, the 32 bits of an integer will be easily overflown. You probably need to use a type that can hold a larger number.

Torben
  • 3,805
  • 26
  • 31
1
private BigDecimal createBigDecimalFromString(String data)
{
    BigDecimal value = BigDecimal.ZERO;

    try
    {
        byte[] tmp = data.getBytes("UTF-8");
        int numBytes = tmp.length;
        for(int i = numBytes - 1; i >= 0; i--)
        {
            BigDecimal exponent = new BigDecimal(256).pow(i);
            value = value.add(exponent.multiply(new BigDecimal(tmp[i])));
        }
    }
    catch (UnsupportedEncodingException e)
    {
    }
    return value;
}
Romain Hippeau
  • 24,113
  • 5
  • 60
  • 79
1

Maybe a little bit late, but I'm going to give my 10 cents to simplify it (internally is similar to BigDecimal suggested by @Romain Hippeau)

public static BigInteger getNumberId(final String value) {
    return new BigInteger(value.getBytes(Charset.availableCharsets().get("UTF-8")));
}
Community
  • 1
  • 1
Wellington Souza
  • 2,200
  • 2
  • 22
  • 33
1

Regardless of the accepted answer, it is possible to represent any String as an Integer by computing that String's Gödelnumber, which is a unique product of prime numbers for every possible String. With that being said it's quite impractical and slow to implement, also for most Strings you would need a BigInteger rather than a normal Integer and to decode a Gödelnumber into its corresponding String you need to have a defined Charset.

Andreas Hartmann
  • 1,825
  • 4
  • 23
  • 36