0

I am trying to generate three distinct strings, A, B, and C, such that their hash values are all equal using the default hash function provided by the programming language. Specifically, I need to ensure that A is not equal to B, B is not equal to C, and A is not equal to C.

I have tried several approaches but haven't been successful in finding a solution yet. I am seeking assistance to implement a method or algorithm that can fulfill these requirements. It's crucial that the hash values of all three strings are the same.

Here is my implementation, however, it is still incomplete because I have a collision with the first two strings but not with the third one.

var dictionary = new Dictionary<int, string>();

  int collusionCounter = 0, stringCounter = 0;
  string myString;
  int hash = 0;

  List<string> myList = new List<string>();


  while (true)
  {
    stringCounter++;
    myString = stringCounter.ToString();

    try
    {
      hash = myString.GetHashCode();
      dictionary.Add(hash, myString);
    }
    catch (Exception)
    {
      if (dictionary.ContainsKey(hash))
      {
        myList.Add(myString);
        collusionCounter++;
        if (collusionCounter == 2)
        {
          break;
        }
      }
      continue;
    }
  }

  var A = myList[0];
  var B = myList[1];
  var C = dictionary[hash];

  Console.WriteLine($"{A.GetHashCode()} {B.GetHashCode()} {C.GetHashCode()}");

And hier is a result of implementation :

374545419 1954295680 1954295680

I would appreciate any guidance or insights on how to achieve this task effectively. Thank you!

  • So this is something you need to be very careful about. There is a very good reason that HashCodes are randomized in .NET. This article gives a pretty good overview of HashFlooding and the dangers it poses as well as an implementation of a deterministic hash code if you have a use case, such as generating a key for an INTERNAL cache store: https://andrewlock.net/why-is-string-gethashcode-different-each-time-i-run-my-program-in-net-core/ – ye-olde-dev May 27 '23 at 22:32
  • Because of the randomization that is enabled by default in current runtimes: any answers A B and C will be different per processes (i.e. between runs), so: I can say `"Fred"`, `"Barney"` and `"Wilma"` and *you can't say that I'm wrong* – Marc Gravell May 27 '23 at 22:34
  • @ ye-olde-dev Thank you for your response and the valuable information about HashFlooding and the rationale behind randomized HashCodes in .NET. I appreciate the link to the article you shared as well. – Ahmadou Kassoum May 27 '23 at 22:45
  • @MarcGravell is it possible to create a program to have the collision? – Ahmadou Kassoum May 27 '23 at 23:20
  • The chances that two string produce the same hash is very small but it can happen. That is why IEqualityComparer Interface tests the hash and when the hash matches also test the actual values. See : https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.iequalitycomparer-1?view=net-7.0 – jdweng May 28 '23 at 07:47

1 Answers1

5

String hashcodes in .NET are not stable, meaning that a specific string has different hashcode each time you run a program. Hashcodes are stable only during a single execution of a program. This .NET feature probably undermines what you are trying to do, but let's assume that string hashcodes in .NET were stable, and try to find an answer to your question under this assumption.

You might be able to find 3 different strings having the same hashcode mathematically, by knowing the algorithm that produces the hashcode and reverse-engineering it. This might not be unrealistic because hashcodes are not meant to be cryptographicaly secure, so reverse-engineering them might be feasible. But I can't help you in this direction because I am not a mathematician.

I'll suggest a brute-force probabilistic approach for solving this problem. .NET hashcodes are 32 bit numbers, so it's guaranteed that you'll get at least one collision if you have a set of 2 ^ 32 + 1 (4,294,967,297) elements. You will need a generator of strings that can produce more unique strings than this number. A good candidate seems to be a generator of all permutations of 8 lower-case Latin characters, with a population space of 26 ^ 8 = 208,827,064,576‬ strings. On average ~48 strings will share the same hashcode, so you will be very unlucky if you pick randomly a string that doesn't collide with 2 others. The algorithm to find the 3 strings goes like this:

  1. Add the first generated string in a list a, and store its hashcode in a variable b.
  2. Start a loop where in each iteration you generate the next string, and compare its hashcode with the b. If the values are equal add the generated string in the list a.
  3. Exit the loop when you have 3 strings in the list a. These strings are different, and they share the same hashcode.

I would expect to have your result after about 8 billion iterations of the loop.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104