What's the value range of String.GetHashCode()?
For random strings with different length, are their hash code value range different?
e.g.
There're 2 groups of random strings. Group 1 strings are of length 5. Group 2 strings are of length 10. Do the 2 groups have the same hash code value range?
Update 1
My problem scenario is:
I have a method with input as some fixed length random GUID strings. I need to pick a fixed (but not predefined) set of them at a fixed percentage. I am considering to divide the string hash code value range into 10 segments, and pick the strings whose hash value falls into the first segment. Thus I got a fixed 10% of all the input strings.
Update 2
The input GUID strings are not given in a list. They are given one by one. And there can be duplicated ones. And I will never know how many they are. I just need to make sure the overall percentage. And if a string was picked before, they will always be picked.
Below is my experiment:
static void Main(string[] args)
{
double min = int.MaxValue / 100.0 * 15.0;
double max = int.MaxValue / 100.0 * 25.0;
double total = 0;
double picked = 0;
Console.WriteLine("range ratio: {0:f4}%", (max - min) / int.MaxValue * 100);
for (int i = 0; i < 500000; i++)
{
string mcid = Guid.NewGuid().ToString();
int hash = mcid.GetHashCode();
total++;
if (hash >= min && hash <= max)
{
picked++;
}
Console.Write("\rPicked: {0:f4}, Total {1:f4}, Ratio: {2:f4}%", picked, total, picked / total * 100.0);
}
}
I run the code several times, the output is a bit strange. The ratio of picked GUID is always half of the range ratio. If this is true. I guess I can just use a double-sized range.
for example:
range ratio: 10.0000%
Picked: 25028.0000, Total 500000.0000, Ratio: 5.0056%