17

An existing system written in Java uses the hashcode of a string as its routing strategy for load balancing.

Now, I cannot modify the system but need to generate strings that share the same hashcode to test the worst condition.

I provide those strings from commandline and hope the system will route all these strings into the same destination.

Is it possible to generate a large numbers of strings that share the same hashcode?

To make this question clear:

String[] getStringsInSameHashCode(int number){
    //return an array in length "number"
    //Every element of the array share the same hashcode. 
    //The element should be different from each other
}

Remarks: Any hashCode value is acceptable. There is no constraint on what the string is. But they should be different from each other.

EDIT: Override method of String class is not acceptable because I feed those string from command line.

Instrumentation is also not acceptable because that will make some impacts on the system.

StarPinkER
  • 14,081
  • 7
  • 55
  • 81

6 Answers6

35

see a test method, basically, so long as you match, a1*31+b1 = a2*31 +b2, which means (a1-a2)*31=b2-b1

public void testHash()
{
    System.out.println("A:" + ((int)'A'));
    System.out.println("B:" + ((int)'B'));
    System.out.println("a:" + ((int)'a'));

    System.out.println(hash("Aa".hashCode()));
    System.out.println(hash("BB".hashCode()));
    System.out.println(hash("Aa".hashCode()));
    System.out.println(hash("BB".hashCode()));


    System.out.println(hash("AaAa".hashCode()));
    System.out.println(hash("BBBB".hashCode()));
    System.out.println(hash("AaBB".hashCode()));
    System.out.println(hash("BBAa".hashCode()));

}

you will get

A:65
B:66
a:97
2260
2260
2260
2260
2019172
2019172
2019172
2019172

edit: someone said this is not straightforward enough. I added below part

    @Test
    public void testN() throws Exception {
        List<String> l = HashCUtil.generateN(3);
        for(int i = 0; i < l.size(); ++i){
            System.out.println(l.get(i) + "---" + l.get(i).hashCode());
        }
    }
AaAaAa---1952508096
AaAaBB---1952508096
AaBBAa---1952508096
AaBBBB---1952508096
BBAaAa---1952508096
BBAaBB---1952508096
BBBBAa---1952508096
BBBBBB---1952508096

below is the source code, it might be not efficient, but it work:

public class HashCUtil {

    private static String[] base = new String[] {"Aa", "BB"};

    public static List<String> generateN(int n)
    {
        if(n <= 0)
        {
            return null;
        }

        List<String> list = generateOne(null);
        for(int i = 1; i < n; ++i)
        {
            list = generateOne(list);
        }

        return list;
    }


    public static List<String> generateOne(List<String> strList)
    {   
        if((null == strList) || (0 == strList.size()))
        {
            strList = new ArrayList<String>();
            for(int i = 0; i < base.length; ++i)
            {
                strList.add(base[i]);
            }

            return strList;
        }

        List<String> result = new ArrayList<String>();

        for(int i = 0; i < base.length; ++i)
        {
            for(String str: strList)
            {   
                result.add(base[i]  + str);
            }
        }

        return result;      
    }
}

look at String.hashCode()

   public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }
hrk singh
  • 143
  • 12
hetaoblog
  • 1,990
  • 5
  • 26
  • 34
  • well, that's fine if this is SO's rule or culture to provide english only link... I just want to provide more to the author; while for the problem itself, i think i've explained enough using demo codes and some words here... – hetaoblog Oct 17 '12 at 02:55
  • 1) Yes it is. 2) The demo codes and words don't actually answer the question. The question is about how to **generate** collisions. An explanation of how/why the collisions occur is not relevant. – Stephen C Oct 17 '12 at 03:01
  • I think this is a very good answer, although the generated string is very long if N is very large. – StarPinkER Oct 17 '12 at 14:25
9

I think find a equal-hash string from a long string is too hard, it's easy when find equal-hash string of an short string (2 or 3). Look at the equation below. (sorry I cant post image cause me new member)

Notice that, "FB" and "Ea" have the same hashcode, and any two strings like s1+"FB"+s2 and s1+"Ea"+s2 will have the same hashcode. So, the easy solution is finding any 2-char substring of existing string and replace with a 2-char substring with the same hashcode

Exmaple, we have the string "helloworld" get 2-char substring "he", hashcode("he") = 'h'*31 + 'e' = ('h'*31 + 31) + ('e' - 31) = ('h'+1)*31 + 'F' = 'i' + 'F' = hashcode("iF") so the desire string is "iFlloworld" we have increased 'h' by 1, we can increase by 2, or 3 etc (but will be wrong if it overflow the char value)

The below code run well with small level, it will wrong if the level is big, make the char value overflow, I will fix it later if you want (this code change 2 first chars, but I will edit code to 2 last chars because 2 first chars are calc with largest value)

    public static String samehash(String s, int level) {
    if (s.length() < 2)
        return s;
    String sub2 = s.substring(0, 2);
    char c0 = sub2.charAt(0);
    char c1 = sub2.charAt(1);
    c0 = (char) (c0 + level);
    c1 = (char) (c1 - 31 * level);
    String newsub2 = new String(new char[] { c0, c1 });
    String re =  newsub2 + s.substring(2);
    return re;
}
yelliver
  • 5,648
  • 5
  • 34
  • 65
3

I was wondering if there was a "universal" solution; e.g. some constant string XYZ, such that

    s.hashCode() == (s + XYZ).hashCode() 

for any string s. Finding such a string involves solving a fairly complicated equation ... which was beyond my rusty mathematical ability. But then it dawned on me that h == 31*h + ch is always true when h and ch are both zero!

Based on that insight, the following method should create a different String with the same hashcode as its argument:

    public String collider(String s) { 
        return "\0" + s;
    }

If NUL characters are problematic for you, prepending any string whose hashcode is zero would work too ... albeit that the colliding strings would be longer than if you used zero.

user659104
  • 14
  • 2
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
3

Given String X, then String Y = "\u0096\0\0ɪ\0ˬ" + X will share same hashcode with X.

Explanation:

  1. String.hashcode() returns Integer, and every Integer X in java has property that X = X + 2 * (Integer.MAX_VALUE + 1). Here, Integer.MAX_VALUE = 2 ^ 31 - 1;

  2. So we only need to find String M, which has the property that M's hashcode % (2 * (Integer.MAX_VALUE + 1)) = 0;

  3. I find "\u0096\0\0ɪ\0ˬ" : \u0096 's ascii code is 150,\0 's ascii code is 0, ɪ's ascii code is 618, ˬ's ascii code is 748, so its hashcode is 150 * 31 ^ 5 + 618 * 31 ^ 2 + 748 = 2 ^ 32 = 0;

It is up to you which string you would like, and I pick this one.

  • I like your logic and proof, but prepending "\0" in [this answer](https://stackoverflow.com/a/12926571/1108305) wins for simplicity. – M. Justin May 16 '23 at 16:22
1

You can instrument the java.lang.String class so its method hashCode() will always return the same number.

I suppose Javassist is the easiest way to do such an instrumentation.

In short:

  • obtain an instance of java.lang.instrument.Instrumentation by using a Java-agent (see package java.lang.instrument documentation for details)
  • redefine java.lang.String class by using Instrumentation.redefineClasses(ClassDefinition[]) method

The code will look like (roughly):

ClassPool classPool = new ClassPool(true);
CtClass stringClass = classPool.get("java.lang.String");
CtMethod hashCodeMethod = stringClass.getDeclaredMethod("hashCode", null);
hashCodeMethod.setBody("{return 0;}");
byte[] bytes = stringClass.toBytecode();
ClassDefinition[] classDefinitions = new ClassDefinition[] {new ClassDefinition(String.class, bytes);
instrumentation.redefineClasses(classDefinitions);// this instrumentation can be obtained via Java-agent

Also don't forget that agent manifest file must specify Can-Redefine-Classes: true to be able to use redefineClasses(ClassDefinition[]) method.

  • Thanks for your answer. Overriding the hashCode method is not acceptable since it will affect the system. The scenario is I need to test the system with those string literal. Modification on the system is definitely unacceptable. – StarPinkER Oct 17 '12 at 02:35
  • @Jermaine Xu, this is not overriding, but instrumentation. However yes you do need an ability to relaunch JVM with the "existing system written in Java" and add an agent to JVM via command-line arguments. So if you can't do this, my suggestion is unusable. In this case the "hetaoblog"'s answer should fit your situation:) – Valentin Kovalenko Oct 17 '12 at 02:43
  • Instrumentation is a good idea, but the objectives is testing, so I cannot modify redefine the hashCode method of String. Thanks for your instrumentation idea. – StarPinkER Oct 17 '12 at 02:46
0
String s = "Some String"
for (int i = 0; i < SOME_VERY_BIG_NUMBER; ++i) {
    String copy = new String(s);

    // Do something with copy.
}

Will this work for you? It just creates a lot of copies of the same String literal that you can then use in your testing.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
  • Sorry I didn't make it clear enough. Same string literal is not acceptable, because the string is the primary key in the database, I do need different string literals. – StarPinkER Oct 17 '12 at 02:06