Performance by splitting maps into chunks

Question

I'm going to be working with a relatively big data-set (~10,000 entries) which is constantly read (every few seconds most data will be read) and occasionally written to. Is there a a performance advantage I can get by splitting all this information into chunks (by, let's say a name as they are not unique) or just throwing everything into one map and reading from it? Performance is really important. E.g.:

private HashMap<String, ObjectInformation> map = new HashMap<>();
public ObjectInformation imitateOperation(String query) {
   return map.get(query);
}

vs

private HashMap<String, HashMap<String, ObjectInformation>> map = new HashMap<>();
public ObjectInformation imitateOperation(String name, String query) {
   return map.get(name).get(query);
}

Thank you.

Time complexity of accessing elements of a `HashMap` is `O(1)` - doesn't matter how many elements there are it will be constant. Splitting it into multiple chunks will worsen the peformance if anything. — Amongalen, Jan 22 '20 at 10:20
What happened when you tried it? I'm also not quite sure what you're asking--you're asking if two map lookups are faster than one? You're `get`ting by `query` either way, no? — Dave Newton, Jan 22 '20 at 10:20
10,000 entries is known as a small dataset. Also, when optimizing for performance, you need to know how to measure performance. Too many people think they can optimize for performance, when they don't even have a clue about the performance of their current code, then they come up with guesses about how things might improve. You can *guess* how well that works. — Kayaman, Jan 22 '20 at 10:21
dividing your data makes sense as soon as you want to run parallel threads accessing it. For a single thread it's faster to keep it all in one single map — Duke, Jan 22 '20 at 10:27

GotoFinal · Answer 1 · 2020-01-22T10:35:30.547

Reading from HashMap is fast and does not really depend on size of HashMap as long as keys have unique hash.
As for good balanced HashMap access time is always O(1) - unless all keys would have the same hashcode.

You could check using debugger if hash map does not have many hash collisions and if it does then wrap keys into some own object with custom hash code implementation - but it will not be easy to write good one it will require a lot of testing. And you probably don't need it, just keep that single map.

Also 10 000 is not really anything big, how much that performance is important? Like 1ms is a lot for you? as this will be probably few orders of magnitude faster already.

And like others here said:
1. First check if the code is a bottleneck for your performance goals. You can use profiler to do this, or some custom timing statistics.
2. Then create benchmark to confirm this and measure exactly how long does it take.
3. And test if solution you are thinking about actually does improve time in benchmarks. You can again use profilers to see what point of your code is slowest.

score 0 · Answer 2 · answered Jan 22 '20 at 10:27

First of all, all performance questions should be verified with benchmarks. Please check JMH (there is guide here).

If you doesn't change your map then it is better to use single huge array, instead of multiple smaller arrays. The idea is the same with ArrayList vs LinkedList comparison - less jumps are needed. And of course if the most of queries have the same name then you won't have any advantage with multilevel map.

In the other case, smaller maps can give benefits when you:

Update map. In this case hash map is rebuilt, so you receive faster rebuilds (however their frequency can be increased).
Recreate map. In this case JVM can allocate internal arrays as Large Object, which can get performance degradation.

However, the most important item: do benchmark first. Real timings can be different between JVM versions.

Performance by splitting maps into chunks

2 Answers2