3

I want to take a string and generate a number from 0-9. What number I get doesn't have to be predictable, but the same string has to consistently generate the same number.

My initial idea was to just do string.GetHashCode() and take the last digit from the code.

If I did this, would I (a) be guaranteed to always end up with the same number for the same string, and would I (b) end up with a reasonably even distribution of numbers between 0-9?

Alternatively, is there a better way of achieving what I want?

Zissou
  • 227
  • 1
  • 8
  • http://stackoverflow.com/questions/16999361/sha-256-hash-in-c-sharp – Eser Sep 04 '15 at 14:26
  • 1
    so you want something like a checksum generator? – user1666620 Sep 04 '15 at 14:27
  • you could do gethashcode, and then take the last digit, and convert it to ascii and the last digit of that – johnny 5 Sep 04 '15 at 14:30
  • 4
    [You can't rely on `GetHashCode` being consistent](http://stackoverflow.com/questions/53086/can-i-depend-on-the-values-of-gethashcode-to-be-consistent) – theB Sep 04 '15 at 14:35
  • As @johnny5 have said you could take `string.GetHashCode`, but there is no much sence in this sophisticated action, you just want to get hash from 0-9 range wich will give you huge amount of collisions in anyway, so for example you could just take first byte of string and take the first digit in it decimal representation, or do the same with string length – Ilia Maskov Sep 04 '15 at 14:35
  • The most common random number is 47. You could just hard-code that. j/k! – Jerry Nixon Sep 04 '15 at 14:59
  • 3
    To make it clear, certainly `GetHashCode` is "consistent" (i.e. will return the same value every time for every `string` instance with that string value) during the life-time of your application ***process***. So while a particular instance of your application runs, the hash for a given string value is fixed. _If that were not the case, `GetHashCode` would be useless._ However: If you exit your process, update the .NET Framework to a newer version, and then start your application again, then that string value may have a different hash under the new BCL version. – Jeppe Stig Nielsen Sep 04 '15 at 15:03

4 Answers4

8

This should do the trick - I use this for deterministic mocking:

public static long GetDeterministicId(string m)
{
    return (long) m.ToCharArray().Select((c, i) => Math.Pow(i, c%5)*Math.Max(Math.Sqrt(c), i)).Sum();
}

EDIT

if you only want number 0-9, then further mod it by 10:

public static long GetDeterministicId(string m)
{
    return (longg) m.ToCharArray().Select((c, i) => Math.Pow(i, c%5)*Math.Max(Math.Sqrt(c), i)).Sum() % 10;
}

I've ran this for 1000 most commonly used words in English (https://gist.github.com/deekayen/4148741#file-1-1000-txt) and the distribution of 0-9 is:

0 -> 156
1 -> 163
3 -> 114
7 -> 79
6 -> 72
9 -> 55
2 -> 128
8 -> 45
5 -> 89
4 -> 99

which is not perfect, but is OK.

EDIT 2

Further testing shows that replacing the first modulo by 8 (i.e. Math.Pow(i, c%8)*) produces even better distribution:

0 -> 95
1 -> 113
2 -> 148
3 -> 91
4 -> 68
5 -> 92
6 -> 119
7 -> 79
8 -> 99
9 -> 96

EDIT 3

OK, the winner is

return (int)m.ToCharArray().Select((c, i) => Math.Pow(i+2, c % 8) * Math.Max(Math.Sqrt(c), i+2)).Sum() % 10;

and the distribution of 0 - 9 is

0 -> 90
1 -> 96
2 -> 100
3 -> 99
4 -> 97
5 -> 106
6 -> 110
7 -> 90
8 -> 103
9 -> 109

which is close enough for an even distribution!

rbm
  • 3,243
  • 2
  • 17
  • 28
2

For a very "low-tech" method, where is less impressive than rbm's answer... You could do this:

string strEntry = "lol"; //Your String Here
int intNum = (int)strEntry[strEntry.Length - 1]; //To Convert last letter to its numeric equivalent. Jeppe Stig Nielsen's suggestion
intNum = int.Parse(intNum.ToString().Substring(intNum.ToString().Length - 1)); //Get the last digit of the number you got from previous step

The number you get will most definitely be from 0-9, and will always be the same. Plus, you also easily understand what the code's doing, I guess.

Alternatively... you can use a slightly fancier method where it just sums up each numeric value of each of the letter in your string, then returns the final digit of that:

string strEntry = "lol";
List<int> intList = new List<int>();
foreach (char c in strEntry)
{
   intList.Add((int)c);
}
int intNum = intList.Sum();
intNum = int.Parse(intNum.ToString().Substring(intNum.ToString().Length - 1));

If you don't want to just use the last digit provided in the second option above... you could do this:

string strEntry = "lol";
List<int> intList = new List<int>();
foreach (char c in strEntry)
{
   intList.Add((int)c);
}
int intNum = intList.Sum();
while (intNum.ToString().Length != 1)
{
   intList.Clear();
   foreach (char c in intNum.ToString())
   {
       intList.Add(int.Parse(c.ToString()));
   }
   intNum = intList.Sum();
}
//You can just get the number you required from intNum
Kaitlyn
  • 791
  • 1
  • 10
  • 28
  • I guess, but when you convert a character to a number, it gives you what would be the ASCII decimal to that character. That's what is being used here. I'm not exactly converting strings directly either, I'm breaking them up into char. – Kaitlyn Sep 04 '15 at 15:00
  • 1
    No, it doesn't. Converting to int produces the Unicode code point. For ASCII / Basic Latin characters, this is identical to the ASCII code point, but Unicode is much more than ASCII. – Sebastian Negraszus Sep 04 '15 at 15:06
  • Okay, maybe I was wrong about it being ASCII, but at least it does the job. I will remove any references to ASCII. – Kaitlyn Sep 04 '15 at 15:07
  • 1
    To get the last `char` (i.e. the last UTF-16 **code unit**) of string `strEntry`, it is easier to use the indexer, so `int intNum = strEntry[strEntry.Length - 1];`. – Jeppe Stig Nielsen Sep 04 '15 at 15:08
0

there's many ways to achieve this functionality. For example you can take the rest of division by 10 of the sum of all characters.

public static int HashString(string str)
{
   if(string.IsNullOrEmpty(str)) return 0;
   return str.ToCharArray().Sum(c => (int)c) % 10;
}
S. Gmiden
  • 147
  • 5
0

would I (a) be guaranteed to always end up with the same number for the same string [?]

No. As theB mentioned in a comment, the value of GetHashCode is an implementation detail and not necessarily consistent e.g. across different versions of .NET. You are probably better off writing your own function.

How about a simple checksum?

public static int CheckSum(string s)
{
    int sum = 0;
    foreach (char c in s)
    {
        sum = (sum + c)%10;
    }
    return sum;
}
Community
  • 1
  • 1
Sebastian Negraszus
  • 11,915
  • 7
  • 43
  • 70