0

In a study project we should create 100.000 threads to get a feeling for the time it takes to create a lot of threads and why it's more efficient to use tasks instead.

However we found out, that the same "Create and start 100.000 threads" code runs a lot slower on a modern Ryzen AMD systems compared to some older (even notebook) Intel systems. We did some benchmarking with different JDKs, but all using Java 16 (older versions didn't make a difference).

public class ThreadCreator {
    public static void main(String[] args) throws InterruptedException {
        List<Thread> startedThreads = new ArrayList<>();
        long startTime = System.currentTimeMillis();

        for (int i = 0; i < 100_000; i++) {
            Thread t = new Thread(() -> {});
            t.start();
            startedThreads.add(t);
        }

        for (Thread t : startedThreads) {
            t.join();
        }

        System.out.println("Duration: " + (System.currentTimeMillis() - startTime));
    }
}

The benchmark results:

AMD Ryzen 7 3700X System (Java 16, Ubuntu 20.04):
Adopt OpenJDK (Hotspot): 13882ms
Adopt OpenJDK (OpenJ9): 7521ms

Intel i7-8550U System (Fedora 34, Java 16):
Adopt OpenJDK (Hotspot): 5321ms 
Adopt OpenJDK (OpenJ9): 3089ms

Intel i5-6600k System (Windows 10, Java 16):
Adopt OpenJDK (Hotspot): 29433ms (Maybe realted to low memory of this system)
Adopt OpenJDK (OpenJ9): 5119ms

The OpenJ9 JVM reduces the time on both systems to nearly the half. However the AMD system never reaches the time of the Intel systems. The AMD system only runs at 10% cpu utilisation during this test.

What might be the reason why creating threads is so much slower on AMD systems compared to Intel systems?

ForJ9
  • 735
  • 8
  • 24
  • 1
    You should probably use a profiler and it will better tell you where the time is being spent. – matt May 15 '21 at 19:52
  • 3
    You're not just measuring thread starting. You're also measuring scheduling and IPC, because you wait for all of them to finish and `join` to complete. I don't think this ruins your results, but it's important to notice the distinction. Also, the distinction between Linux and Windows is quite important here. Ideally you'd want to run these kinds of checks in as-equal-as-possible systems (say from a Linux distro booted from an USB stick). – Joachim Sauer May 15 '21 at 19:53
  • @JoachimSauer I agree complety, that the measurements aren't reliable enough to do a comparision between linux and windows. However I think that this measurements are reliable enough to show a tendency that AMD systems seems to be slower. Do you have an idea to only benchmark the starting of threads? – ForJ9 May 15 '21 at 19:59
  • 1
    I think this is a question better posted to the OpenJDK discussion list, they'd probably be much more likely to have real answers ready-to-hand. (They might read StackOverflow, but I bet they often don't.). – markspace May 15 '21 at 20:01
  • 1
    "Slower" for this specific case, maybe. "Slower" in the sense that for this specific thing that isn't really how you ought to do things on _any_ system, AMD degrades worse. – Louis Wasserman May 15 '21 at 20:06
  • 2
    If you want to ascribe the difference to processor type, then it would be well to keep all other variables unchanged -- same OS, same amount to memory, same disks, etc. – iggy May 16 '21 at 01:31
  • You should meassure elapsed time using `System.nanoTime()` (see https://stackoverflow.com/a/180191), `currentTimeMillis()` is affected by time adjustments done by the OS; though that would probably not explain the big difference. – Marcono1234 May 16 '21 at 23:02

1 Answers1

0

I have a Ryzen 3700 system running Windows 10 and I got the following results: Duration: 5.813002900 seconds 100000 tasks completed.

The program I ran, using Ada is:

with Ada.Text_IO;  use Ada.Text_IO;
with Ada.Calendar; use Ada.Calendar;

procedure Main is
   protected counter is
      procedure add;
      function report return Natural;
   private
      count : Natural := 0;
   end counter;

   protected body counter is
      procedure add is
      begin
         count := count + 1;
      end add;
      function report return Natural is
      begin
         return count;
      end report;
   end counter;

   task type worker;

   task body worker is
   begin
      counter.add;
   end worker;

   type worker_access is access worker;

   type list is array (Positive range 1 .. 100000) of worker_access;

   start_time : Time;
   end_time   : Time;
begin
   start_time := Clock;
   declare
      The_List : list;
   begin
      for I in The_List'Range loop
         The_List (I) := new worker;
      end loop;
   end;
   end_time := Clock;
   Put_Line
     ("Duration:" & Duration'Image (end_time - start_time) & " seconds");
   Put_Line (Natural'Image (counter.report) & " tasks completed.");
end Main;

This program creates a protected object used to count the number of tasks (similar to Java threads) executed. The protected procedure named add only allows one task at a time to increment the count within the protected object.

The inner block within the main procedure achieves the effect of a Java join. Note that a timing of 5.813 seconds is the same as 5813 milliseconds.

Jim Rogers
  • 4,822
  • 1
  • 11
  • 24