2

Is Java String.hashcode() completely independent of Locale? In other words, if someone fiddles with the default Locale, are we 100% sure this is not going to impact the hash code?

We know that such fiddling impacts toUpperCase() and toLowerCase().

Jérôme Verstrynge
  • 57,710
  • 92
  • 283
  • 453

4 Answers4

5

The Locale does not affect the hashCode of the String (directly). It is solely based on the chars stored in the String. The hashCode is generated by

char[] val;

for (int i = 0; i < len; i++) {
    h = 31*h + val[off++];
}

but the problem is how the String is generated. If it is, for example, the result of toUpperCase, which depends on Locale, obviously the resulting String is dependent on Locale and so is the hashCode.

user85421
  • 28,957
  • 10
  • 64
  • 87
3

Good question, I ran a quick test and it seems changing the default locale does not (thankfully) change the hashcode ...

import java.util.Locale;

public class HashCodeTester {

    public static void main(String[] args) {

        String test = "test";
        int hashCode = test.hashCode();

        System.out.println("hashcode [" + hashCode + "] - locale [" + Locale.getDefault() + "]");

        Locale[] availableLocales = Locale.getAvailableLocales();
        for(int i=0; i<availableLocales.length; i++) {          
            Locale.setDefault(availableLocales[i]);
            System.out.println("hashcode [" + test.hashCode() + "] - locale [" + Locale.getDefault() + "]");
        }

    }
}

Output is

hashcode [3556498] - locale [en_IE]
hashcode [3556498] - locale [ja_JP]
hashcode [3556498] - locale [es_PE]
hashcode [3556498] - locale [en]
hashcode [3556498] - locale [ja_JP_JP]
hashcode [3556498] - locale [es_PA]
hashcode [3556498] - locale [sr_BA]
hashcode [3556498] - locale [mk]
hashcode [3556498] - locale [es_GT]
hashcode [3556498] - locale [ar_AE]
hashcode [3556498] - locale [no_NO]
hashcode [3556498] - locale [sq_AL]
hashcode [3556498] - locale [bg]
hashcode [3556498] - locale [ar_IQ]
hashcode [3556498] - locale [ar_YE]
hashcode [3556498] - locale [hu]
hashcode [3556498] - locale [pt_PT]
hashcode [3556498] - locale [el_CY]
hashcode [3556498] - locale [ar_QA]
hashcode [3556498] - locale [mk_MK]
hashcode [3556498] - locale [sv]
hashcode [3556498] - locale [de_CH]
hashcode [3556498] - locale [en_US]
hashcode [3556498] - locale [fi_FI]
hashcode [3556498] - locale [is]
hashcode [3556498] - locale [cs]
hashcode [3556498] - locale [en_MT]
hashcode [3556498] - locale [sl_SI]
hashcode [3556498] - locale [sk_SK]
hashcode [3556498] - locale [it]
hashcode [3556498] - locale [tr_TR]
hashcode [3556498] - locale [zh]
hashcode [3556498] - locale [th]
hashcode [3556498] - locale [ar_SA]
hashcode [3556498] - locale [no]
hashcode [3556498] - locale [en_GB]
hashcode [3556498] - locale [sr_CS]
hashcode [3556498] - locale [lt]
hashcode [3556498] - locale [ro]
hashcode [3556498] - locale [en_NZ]
hashcode [3556498] - locale [no_NO_NY]
hashcode [3556498] - locale [lt_LT]
hashcode [3556498] - locale [es_NI]
hashcode [3556498] - locale [nl]
hashcode [3556498] - locale [ga_IE]
hashcode [3556498] - locale [fr_BE]
hashcode [3556498] - locale [es_ES]
hashcode [3556498] - locale [ar_LB]
hashcode [3556498] - locale [ko]
hashcode [3556498] - locale [fr_CA]
hashcode [3556498] - locale [et_EE]
hashcode [3556498] - locale [ar_KW]
hashcode [3556498] - locale [sr_RS]
hashcode [3556498] - locale [es_US]
hashcode [3556498] - locale [es_MX]
hashcode [3556498] - locale [ar_SD]
hashcode [3556498] - locale [in_ID]
hashcode [3556498] - locale [ru]
hashcode [3556498] - locale [lv]
hashcode [3556498] - locale [es_UY]
hashcode [3556498] - locale [lv_LV]
hashcode [3556498] - locale [iw]
hashcode [3556498] - locale [pt_BR]
hashcode [3556498] - locale [ar_SY]
hashcode [3556498] - locale [hr]
hashcode [3556498] - locale [et]
hashcode [3556498] - locale [es_DO]
hashcode [3556498] - locale [fr_CH]
hashcode [3556498] - locale [hi_IN]
hashcode [3556498] - locale [es_VE]
hashcode [3556498] - locale [ar_BH]
hashcode [3556498] - locale [en_PH]
hashcode [3556498] - locale [ar_TN]
hashcode [3556498] - locale [fi]
hashcode [3556498] - locale [de_AT]
hashcode [3556498] - locale [es]
hashcode [3556498] - locale [nl_NL]
hashcode [3556498] - locale [es_EC]
hashcode [3556498] - locale [zh_TW]
hashcode [3556498] - locale [ar_JO]
hashcode [3556498] - locale [be]
hashcode [3556498] - locale [is_IS]
hashcode [3556498] - locale [es_CO]
hashcode [3556498] - locale [es_CR]
hashcode [3556498] - locale [es_CL]
hashcode [3556498] - locale [ar_EG]
hashcode [3556498] - locale [en_ZA]
hashcode [3556498] - locale [th_TH]
hashcode [3556498] - locale [el_GR]
hashcode [3556498] - locale [it_IT]
hashcode [3556498] - locale [ca]
hashcode [3556498] - locale [hu_HU]
hashcode [3556498] - locale [fr]
hashcode [3556498] - locale [en_IE]
hashcode [3556498] - locale [uk_UA]
hashcode [3556498] - locale [pl_PL]
hashcode [3556498] - locale [fr_LU]
hashcode [3556498] - locale [nl_BE]
hashcode [3556498] - locale [en_IN]
hashcode [3556498] - locale [ca_ES]
hashcode [3556498] - locale [ar_MA]
hashcode [3556498] - locale [es_BO]
hashcode [3556498] - locale [en_AU]
hashcode [3556498] - locale [sr]
hashcode [3556498] - locale [zh_SG]
hashcode [3556498] - locale [pt]
hashcode [3556498] - locale [uk]
hashcode [3556498] - locale [es_SV]
hashcode [3556498] - locale [ru_RU]
hashcode [3556498] - locale [ko_KR]
hashcode [3556498] - locale [vi]
hashcode [3556498] - locale [ar_DZ]
hashcode [3556498] - locale [vi_VN]
hashcode [3556498] - locale [sr_ME]
hashcode [3556498] - locale [sq]
hashcode [3556498] - locale [ar_LY]
hashcode [3556498] - locale [ar]
hashcode [3556498] - locale [zh_CN]
hashcode [3556498] - locale [be_BY]
hashcode [3556498] - locale [zh_HK]
hashcode [3556498] - locale [ja]
hashcode [3556498] - locale [iw_IL]
hashcode [3556498] - locale [bg_BG]
hashcode [3556498] - locale [in]
hashcode [3556498] - locale [mt_MT]
hashcode [3556498] - locale [es_PY]
hashcode [3556498] - locale [sl]
hashcode [3556498] - locale [fr_FR]
hashcode [3556498] - locale [cs_CZ]
hashcode [3556498] - locale [it_CH]
hashcode [3556498] - locale [ro_RO]
hashcode [3556498] - locale [es_PR]
hashcode [3556498] - locale [en_CA]
hashcode [3556498] - locale [de_DE]
hashcode [3556498] - locale [ga]
hashcode [3556498] - locale [de_LU]
hashcode [3556498] - locale [de]
hashcode [3556498] - locale [es_AR]
hashcode [3556498] - locale [sk]
hashcode [3556498] - locale [ms_MY]
hashcode [3556498] - locale [hr_HR]
hashcode [3556498] - locale [en_SG]
hashcode [3556498] - locale [da]
hashcode [3556498] - locale [mt]
hashcode [3556498] - locale [pl]
hashcode [3556498] - locale [ar_OM]
hashcode [3556498] - locale [tr]
hashcode [3556498] - locale [th_TH_TH]
hashcode [3556498] - locale [el]
hashcode [3556498] - locale [ms]
hashcode [3556498] - locale [sv_SE]
hashcode [3556498] - locale [da_DK]
hashcode [3556498] - locale [es_HN]
2

The hashcode of a given String object does not depend on the locale. That should be obvious from the javadoc that you linked.

However, any transformation that produces different characters in the string will lead to a different (non-equal) string and a different hashcode. For instance, translating a bunch of bytes to a String using different default character encoding can result in different characters.


Summary, changing Locale doesn't directly affect String hashcodes, but it could cause your application to produce different String values, and THAT will affect their hashcodes.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • +1 for explaining that, can you give an example of what you said here 'translating a bunch of bytes to a String using different default character encoding can result in different characters' –  Aug 29 '11 at 13:30
  • @eon - the primary issue is that if you pick the *wrong* encoding, the translation will either give you the "random weird characters" or it will replace untranslatable bytes with some character (e.g. '?') that indicates an unrecognised character. The actual behaviour for unrecognised input is "unspecified" if you are using (for instance) a String constructor to convert the bytes. – Stephen C Aug 29 '11 at 13:48
2

The equals method on String clearly states that strings are only equal if they represent the same sequence of characters (that is, no conversions are going on here).

While that does not guarantee that hashcode does not use locale information (in general it might), the implementation in the Oracle JVM looks like this:

public int hashCode() {
    int h = hash;
        int len = count;
    if (h == 0 && len > 0) {
        int off = offset;
        char val[] = value;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }

This only uses the characters and no locale information.

Mathias Schwarz
  • 7,099
  • 23
  • 28