1

I'm trying to find a way to convert a long string ID like "T2hR8VAR4tNULoglmIbpAbyvdRi1y02rBX" to a numerical id.

I thought about getting the ASCII value of each number and then adding them up but I don't think that this is a good way as different numbers can have the same result, for example, "ABC" and "BAC" will have the same result

A = 10, B = 20, C = 50,

ABC = 10 + 20 + 50 = 80

BAC = 20 + 10 + 50 = 80

I also thought about getting each letters ASCII code, then set the numbers next to each other for example "ABC"

so ABC = 102050

this method won't work as having a 20 letter String will result in a huge number, so how can I solve this problem? thank you in advance.

MoTahir
  • 863
  • 7
  • 22

2 Answers2

1

You can use the hashCode() function. "id".hashcode(). All objects implement a variance of this function.

From the documentation:

open fun hashCode(): Int

Returns a hash code value for the object. The general contract of hashCode is:

Whenever it is invoked on the same object more than once, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.

If two objects are equal according to the equals() method, then calling the hashCode method on each of the two objects must produce the same integer result.

All platform object implements it by default. There is always a possibility for duplicates if you have lots of ids.

If you use a JVM based kotlin environment the hash will be produced by the String.hashCode() function from the JVM.

Lionel Briand
  • 1,732
  • 2
  • 13
  • 21
  • 1
    It doesn't take very many ids to have a high likelihood of collisions. – Tenfour04 Dec 09 '19 at 22:02
  • From what I've read here https://stackoverflow.com/questions/10102337/can-javas-hashcode-produce-same-value-for-different-strings there is a chance of collision but the chances are 1 in 2^32. for my case I think that this is a good percentage as I mentioned, I need this id for the notification system, I don't think the chances of collision will be high in this situation. what do you think? or am I missing something? – MoTahir Dec 09 '19 at 22:19
  • 1
    When you use hashes there is always a chance for collision. You can't be 100% sure.. It's not 1 in 2^32 you can look at [this article](http://sigpwned.com/2018/08/10/string-hashcode-is-plenty-unique/) that explains it a bit. I don't know the behavior of your app but the user will probably not receive 100k notifications at the same time. – Lionel Briand Dec 09 '19 at 22:25
0

If you need to be 100% confident that there are no possible duplicates, and the input Strings can be up to 20 characters long, then you cannot store the IDs in a 64-bit Long. You will have to use BigInteger:

val id = BigInteger(stringId.toByteArray())

At that point, I question whether there is any point in converting the ID to a numerical format. The String itself can be the ID.

Tenfour04
  • 83,111
  • 11
  • 94
  • 154
  • isn't there a way to shorten the big integer and make it fit into an Int if I can have the can find out the maximum length of the number generated from converting my String ID then dividing it by a given number can result in unique Integer, for example, 102050 and 201050 divided by 200 will result in 510 and 1005, which is much shorter, but this will reduce my accuracy... but I don't think it will work as the big int is way too big.... – MoTahir Dec 09 '19 at 21:39
  • Any string with more than two UTF-16 code units cannot fit in a 32-bit integer. Dividing by a number doesn't help, because when you do integer math, remainders are thrown out. You would not have unique IDs. Since you can't store any string with more than two letters as an Int, but you are expecting long strings, I don't think it makes sense to complicate it by creating a special case for a miniscule portion of the possible values. Unless the creation of IDs is very controlled so you heavily favor ones in a range that fits in 32 bits. – Tenfour04 Dec 09 '19 at 22:00
  • The best I think you can do to minimize space for IDs that fit in an Int is to have a wrapper class that holds properties for an `Int?` and a `BigInteger?` and leaves one of them null. – Tenfour04 Dec 09 '19 at 22:01
  • the IDs are auto-generated by the server and I can not manipulate them, those IDs need to be converted to int so when a notification is sent by that ID I can stack them together and not show them as separate notifications. the notification manager taks IDs in int only – MoTahir Dec 09 '19 at 22:07
  • With those constraints, I suppose you have to risk some collisions and improperly merged notifications. If there's some sort of pattern to the IDs you receive (like they are mostly small integers or mostly of a specific length), you could try to find a hashing function that is less likely to create collisions for the more common IDs. – Tenfour04 Dec 10 '19 at 22:16