-1

I have an extremely big properties file with i18n resource bundle. It looks like:

button.browse=Browse
button.cancel=Cancel
button.clear=Clear

Please note that keys are sortable and have similar prefixes and I don't expect to have duplicates. The properties file then loaded via java.util.Properties and just queries with get() and never updated. Since Properties file extends an obsolete Hastable I want to use other Map class for a better performance. I'm fine to use more memory but I need to get values as fast as possible. By default we always using a HashMap but as far I understood if there is a lot of entries it anyway will use nodes as a tree map. So maybe I can just used the TreeMap instead?

Is any map implementation that is better for such kind of data? For example something that uses prefix tree from keys or something like that

Sergey Ponomarev
  • 2,947
  • 1
  • 33
  • 43
  • Similar question https://stackoverflow.com/questions/30054408/performance-aspect-of-java-util-property-vs-hashmap – Sergey Ponomarev Sep 09 '20 at 08:32
  • [This one](https://stackoverflow.com/questions/20487619/complexity-of-treemap-insertion-vs-hashmap-insertion) has a nice answer... – deHaar Sep 09 '20 at 08:37
  • 1
    If you're worried about performance, measure. Run a profiler. The optimization effort should be proportional to the time your program spends there... –  Sep 09 '20 at 08:39
  • You will probably not have a better performance with TreeMap than with Properties. HashMap should outperform Hashtable slightly. – assylias Sep 09 '20 at 08:40
  • 1
    It is unlikely that you will get a significant number of hash collisions with language items. Hash collisions are a problem if you have lots of items. Java uses 64 bit hashes, so even if you have 4 **billion** language items, the chance for a collision are only 40%. Degradation with that many values still isn't bad, since the buckets remain small. – Polygnome Sep 09 '20 at 08:40
  • 1
    hashCode() returns 32bit integer, but anyway I got your point: hashmap will perform better because it uses tree map for a bucket with nodes that have collision – Sergey Ponomarev Sep 09 '20 at 08:50
  • Where exactly did you get the idea that 'deprecated' corresponds to 'worse performance'? And where did you get the idea that `TreeMap` outperforms a hash map? You are 100% wrong on both counts. – user207421 Sep 09 '20 at 09:59
  • @Polygnome Java uses 32-bit hashes, and the chances of collision with 64-bit hashes and 4 billion items are many orders of magnitude less than 40%. – user207421 Sep 09 '20 at 10:00
  • @MarquisofLorne No, with 64 bits you actually have about a 40% chance of *one* collision with about 4 billion items. As a rule of thumb, you can expect a collision in sqrt(n) items, where n is the number of bits in a hash function. You are right about the bits, which means you'd expect a collision after about 65536 items. Which still poses no problem at all for hashmaps. Even with literally hundreds of thousands of items, the bucket size is small enough that no problem exists. It would actually be bad if each bucket only held one item.... – Polygnome Sep 09 '20 at 11:23

1 Answers1

0

HashMap uses tree only for a bucket with entries that have the same hashcode. It’s not expected to have a lot of them for such localized strings. So the hashmap is still better here

Sergey Ponomarev
  • 2,947
  • 1
  • 33
  • 43