2

I am testing HashIds for collisions. Here is the dependency:

    <!-- https://mvnrepository.com/artifact/org.hashids/hashids -->
    <dependency>
        <groupId>org.hashids</groupId>
        <artifactId>hashids</artifactId>
        <version>1.0.3</version>
    </dependency>

I am using the following code:

    Hashids hashids = new Hashids("xyz", 6, "0123456789ABCDEFGHJKLMNPQRSTUVWXYZ");

    System.out.println("*******************************************************************");
    System.out.println("Started");

    Set<String> set = new HashSet();

    ExecutorService executor = Executors.newFixedThreadPool(4);

    AtomicLong count = new AtomicLong(0);
    final long max = 10_000_000;
    for (long i = 1; i <= max; i++) {

        executor.execute(() -> {
            set.add(hashids.encode(count.incrementAndGet()));

            // This is just to show me that there is some activity going on
            if (count.get() % 100_000 == 0) {
                System.out.println(count.get() + ": " + new Date());
            }
        });
    }

    // Wait till the executor service tasks are done 
    executor.shutdown();
    while (!executor.isTerminated()) {
        Thread.sleep(1000);
    }

    // Assert that the Set did not receive duplicates
    Assert.assertEquals(max, set.size());

    System.out.println("Ended");
    System.out.println("*******************************************************************");

So, I put an ExecutorService to make it a little bit faster but I am running into problems. Either

  1. The ExecutorService does not complete and it hangs there
  2. The Set contains duplicate values therefore the assertion fails
  3. Using the bare for loop is much faster than when using the ExecutorService

What could be wrong with this piece of code?

Kihats
  • 3,326
  • 5
  • 31
  • 46

2 Answers2

3
  1. You need call executor.awaitTermination right after executor.shutdown() ;
  2. Your Set is not thread safe, try wrap your set with Collections.synchronizedSet(set), or create set based on ConcurrentHashMap.newKeySet(), see this discussion about thread safe set.
Dai
  • 397
  • 4
  • 14
  • `ConcurrentHashMap.newKeySet()` is easier to use that `synchronizedSet` as you don't need to handle the synchronization manually. – assylias Jun 24 '20 at 05:43
  • I had forgotten about `Collections.synchronizedSet(set)`. It has worked now. Though point number `3` is still pending. Without the `ExecutorService`, it takes less time as compared to using the `ExecutorService` – Kihats Jun 24 '20 at 05:46
  • Yes, `ConcurrentHashMap.newKeySet()` may have better concurrent performance than `synchronizedSet`, I'll update the answer. – Dai Jun 24 '20 at 05:50
  • @Kihats point 3 is probably that the overhead of creating different Threads, starting and joining them is greater than the time gain you'd get from it – Lino Jun 24 '20 at 05:54
  • The `Collections.synchronizedSet(set)` just add synchronized block, so it may increase time on `set.add()`. Another way for profiling, you can try sum the time spent on `hashids.encode()` with another `AtomicLong` variable. – Dai Jun 24 '20 at 06:03
0

Queuing a lot of relative fast tasks might be a lot of overhead. You can instead only queue 4 tasks, which loop over a partition of the whole numbers. This will be much faster due to the reduced overhead of queueing and possibly code locality (caches). Besides it avoids concurrent access to counters and collections.

eckes
  • 10,103
  • 1
  • 59
  • 71