Concurrency: Java Map

Question

What is the best way to push 20 Million entities into a java map object?

Without multi-threading it is taking ~40 seconds.
Using ForkJoinPool it is taking ~25 seconds, where I have created 2 tasks and each of these tasks are pushing 10 Million entities

I believe that both these tasks are running in 2 different cores. Question: When I create 1 task that pushes 10 Million data, it takes ~9 seconds, then when running 2 tasks where each of these tasks pushes 10 million data, why does it take ~26 seconds ? Am I doing something wrong ?

Is there a different solution for inserting 20 M data where it takes less than 10 seconds ?

What Map implementation are you using? HashMap or ConcurrentHashMap? — kapex, Jul 28 '17 at 12:05
I am using ConcurrentHashMap. Even if I am using 2 different Map(HashMap) for each of these task, still there is no improvement. @Michael repoStorage.put("sambit:rout:" + i, "C:\\Images\\Provision_Images"); is what I am doing where repoStorage is a Map — PhiberOptixz, Jul 28 '17 at 12:12
Why are you mapping 20 million strings to the same hard-coded value? — Michael, Jul 28 '17 at 12:15
@Michael The problem that I am trying to solve is storing the metadata info for 20 M images as keys and the value as the location of each of these images — PhiberOptixz, Jul 28 '17 at 12:17
So you have a map of , where key seems to be just an id you create, and value is a location ? — azro, Jul 28 '17 at 12:18
How much memory do you have available and how large are your entries? It could be that the expensive part is Object creation, not the insertion into the map - try profiling them separately. — Hulk, Jul 28 '17 at 12:19
@PhiberOptixz Some code would help to resolve the issue. For all that we know, the slowness can be because of some other reason: may be you're reading from the file system, may be your object creation is taking time, may be you're inserting duplicate keys that can cause collisions in the map, etc. — iavanish, Jul 28 '17 at 12:23
@iavanish they keys are not duplicated & don't agree that the object creation is taking time — PhiberOptixz, Jul 28 '17 at 12:25
Do you set the initial capacity and load factor of the Map? To which values? — Hulk, Jul 28 '17 at 12:28
@Hulk 2 HashMap defined like new HashMap< String, String >( (int)(10000000 / 0.75) + 1) — PhiberOptixz, Jul 28 '17 at 12:45
@PhiberOptixz first you say it's `ConcurrentHashMap` and your last statement says it's `HashMap`. so which one is it? — Eugene, Jul 29 '17 at 08:47

score 1 · Answer 1 · answered Jul 29 '17 at 08:57

Without seeing your code, the most probable cause of these bad performance results is due to the garbage collection activity. To demonstrate it, I wrote the following program:

import java.lang.management.ManagementFactory;
import java.util.*;
import java.util.concurrent.*;

public class TestMap {
  // we assume NB_ENTITIES is divisible by NB_TASKS
  static final int NB_ENTITIES = 20_000_000, NB_TASKS = 2;
  static Map<String, String> map = new ConcurrentHashMap<>();

  public static void main(String[] args) {
    try {
      System.out.printf("running with nb entities = %,d, nb tasks = %,d, VM args = %s%n", NB_ENTITIES, NB_TASKS, ManagementFactory.getRuntimeMXBean().getInputArguments());
      ExecutorService executor = Executors.newFixedThreadPool(NB_TASKS);
      int entitiesPerTask = NB_ENTITIES / NB_TASKS;
      List<Future<?>> futures = new ArrayList<>(NB_TASKS);
      long startTime = System.nanoTime();
      for (int i=0; i<NB_TASKS; i++) {
        MyTask task = new MyTask(i * entitiesPerTask, (i + 1) * entitiesPerTask - 1);
        futures.add(executor.submit(task));
      }
      for (Future<?> f: futures) {
        f.get();
      }
      long elapsed = System.nanoTime() - startTime;
      executor.shutdownNow();
      System.gc();
      Runtime rt = Runtime.getRuntime();
      long usedMemory = rt.maxMemory() - rt.freeMemory();
      System.out.printf("processing completed in %,d ms, usedMemory after GC = %,d bytes%n", elapsed/1_000_000L, usedMemory);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

  static class MyTask implements Runnable {
    private final int startIdx, endIdx;

    public MyTask(final int startIdx, final int endIdx) {
      this.startIdx = startIdx;
      this.endIdx = endIdx;
    }

    @Override
    public void run() {
      long startTime = System.nanoTime();
      for (int i=startIdx; i<=endIdx; i++) {
        map.put("sambit:rout:" + i, "C:\\Images\\Provision_Images");
      }
      long elapsed = System.nanoTime() - startTime;
      System.out.printf("task[%,d - %,d], completed in %,d ms%n", startIdx, endIdx, elapsed/1_000_000L);
    }
  }
}

At the end of the processing, this code computes an approximation of the used memory by doing a System.gc() immediately followed by Runtime.maxMemory() - Runtime.freeMemory(). This shows that the map with 20 million entries takes approximately just under 2.2 GB, which is considerable. I have run it with 1 and 2 threads, for various values of the -Xmx and -Xms JVM arguments, here are the resulting outputs (just to be clear: 2560m = 2.5g):

running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms2560m, -Xmx2560m]
task[0 - 19,999,999], completed in 11,781 ms
processing completed in 11,782 ms, usedMemory after GC = 2,379,068,760 bytes

running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms2560m, -Xmx2560m]
task[0 - 9,999,999], completed in 8,269 ms
task[10,000,000 - 19,999,999], completed in 12,385 ms
processing completed in 12,386 ms, usedMemory after GC = 2,379,069,480 bytes

running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms3g, -Xmx3g]
task[0 - 19,999,999], completed in 12,525 ms
processing completed in 12,527 ms, usedMemory after GC = 2,398,339,944 bytes

running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms3g, -Xmx3g]
task[0 - 9,999,999], completed in 12,220 ms
task[10,000,000 - 19,999,999], completed in 12,264 ms
processing completed in 12,265 ms, usedMemory after GC = 2,382,777,776 bytes

running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms4g, -Xmx4g]
task[0 - 19,999,999], completed in 7,363 ms
processing completed in 7,364 ms, usedMemory after GC = 2,402,467,040 bytes

running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms4g, -Xmx4g]
task[0 - 9,999,999], completed in 5,466 ms
task[10,000,000 - 19,999,999], completed in 5,511 ms
processing completed in 5,512 ms, usedMemory after GC = 2,381,821,576 bytes

running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms8g, -Xmx8g]
task[0 - 19,999,999], completed in 7,778 ms
processing completed in 7,779 ms, usedMemory after GC = 2,438,159,312 bytes

running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms8g, -Xmx8g]
task[0 - 9,999,999], completed in 5,739 ms
task[10,000,000 - 19,999,999], completed in 5,784 ms
processing completed in 5,785 ms, usedMemory after GC = 2,396,478,680 bytes

These results can be summarized in the following table:

--------------------------------
heap      | exec time (ms) for: 
size (gb) | 1 thread | 2 threads
--------------------------------
2.5       |    11782 |     12386
3.0       |    12527 |     12265
4.0       |     7364 |      5512
8.0       |     7779 |      5785
--------------------------------

I also observed that, for the 2.5g and 3g heap sizes, there was a high CPU activity, with spikes at 100% during the whole processing time, due to the GC activity, whereas for 4g and 8g it is only observed at the end due to the System.gc() call.

To conclude:

if your heap is sized inappropriately, the garbage collection will kill any performance gain you would hope to obtain. You should make it large enough to avoid the side effects of long GC pauses.
you must also be aware that using a concurrent collection such as ConcurrentHashMap has a significant performance overhead. To illustrate this, I slightly modified the code so that each task uses its own HashMap, then at the end all the maps are aggregated (with Map.putAll()) in the map of the first task. The processing time fell to around 3200 ms

score 1 · Answer 2 · answered Jul 29 '17 at 09:09

1

An addition probably takes one CPU cycle, so if your CPU runs at 3GHz, that's 0.3 nanoseconds. Do it 20M times and that becomes 6000000 nanoseconds or 6 milliseconds. So your measurement is more affected by the overhead of starting threads, thread switching, JIT compilation etc. than by the operation you are trying to measure.

Garbage collection may also play a role as it may slow you down.

I suggest you use a specialized library for micro benchmarking, such as jmh.

Thanks to assylias's post which helped me write my response

answered Jul 29 '17 at 09:09

Coder

1,175
1
12
32

1

Adding a value in a HashMap certainly takes a lot more than a single CPU cycle. 1. calculate the hashcode of the Object to be added, 2. find the corresponding bucket, 3. check the capacity and grow if necessary, 4. Insert into the bucket (either List or Tree), 5. return previously associated value if any. These steps are not done in this order, but my point is it is not a trivial operation. – Hulk Jul 31 '17 at 09:09

Basil Bourque · Answer 3 · 2020-01-18T05:52:31.270

While I have not tried multiple threads, I did try all 7 appropriate Map types of the 10 provided by Java 11.

My results were all substantially faster than your reported 25 to 40 seconds. My results for 20,000,000 entries of < String , UUID > is more like 3-9 seconds for any of the 7 map classes.

I am using Java 13 on:

Model Name: Mac mini
Model Identifier:   Macmini8,1
Processor Name: Intel Core i5
Processor Speed:    3 GHz
Number of Processors:   1
Total Number of Cores:  6
L2 Cache (per Core):    256 KB
L3 Cache:   9 MB
Memory: 32 GB

Preparing.

size of instants: 20000000

size of uuids: 20000000

Running test.

java.util.HashMap took: PT3.645250368S

java.util.WeakHashMap took: PT3.199812894S

java.util.TreeMap took: PT8.97788412S

java.util.concurrent.ConcurrentSkipListMap took: PT7.347253106S

java.util.concurrent.ConcurrentHashMap took: PT4.494560252S

java.util.LinkedHashMap took: PT2.78054883S

java.util.IdentityHashMap took: PT5.608737472S

My code:

System.out.println( "Preparing." );

int limit = 20_000_000; // 20_000_000
Set < String > instantsSet = new TreeSet <>();  // Use `Set` to forbid duplicates.
List < UUID > uuids = new ArrayList <>( limit );

while ( instantsSet.size() < limit )
{
    instantsSet.add( Instant.now().toString() );
}
List < String > instants = new ArrayList <>( instantsSet );
for ( int i = 0 ; i < limit ; i++ )
{
    uuids.add( UUID.randomUUID() );
}
System.out.println( "size of instants: " + instants.size() );
System.out.println( "size of uuids: " + uuids.size() );

System.out.println( "Running test." );
// Using 7 of the 10 `Map` implementations bundled with Java 11.
// Omitting `EnumMap`, as it requires enums for the key.
// Omitting `Map.of` because it is for literals.
// Omitting `HashTable` because it is outmoded, replaced by `ConcurrentHashMap`.
List < Map < String, UUID > > maps = List.of(
        new HashMap <>( limit ) ,
        new WeakHashMap <>( limit ) ,
        new TreeMap <>() ,
        new ConcurrentSkipListMap <>() ,
        new ConcurrentHashMap <>( limit ) ,
        new LinkedHashMap <>( limit ) ,
        new IdentityHashMap <>( limit )
);
for ( Map < String, UUID > map : maps )
{
    long start = System.nanoTime();
    for ( int i = 0 ; i < instants.size() ; i++ )
    {
        map.put( instants.get( i ) , uuids.get( i ) );
    }
    long stop = System.nanoTime();
    Duration d = Duration.of( stop - start , ChronoUnit.NANOS );
    System.out.println( map.getClass().getName() + " took: " + d );

    // Free up memory.
    map = null;
    System.gc(); // Request garbage collector do its thing. No guarantee!
    try
    {
        Thread.sleep( TimeUnit.SECONDS.toMillis( 4 ) );  // Wait for garbage collector to hopefully finish. No guarantee!
    }
    catch ( InterruptedException e )
    {
        e.printStackTrace();
    }
}
System.out.println("Done running test.");

And here is a table I wrote comparing the various Map implementations.

Concurrency: Java Map

3 Answers3