2

Original problem (with full code)

(I kept the original full code alive because the answers below are related to this full code.)

I am currently working on my master thesis project about energy-efficient software engineering where I want to (coarsely) monitor the energy consumption behavior of different Java methods.

Short description of the desired behavior:

I have a set of JMH benchmarks that I want to profile with the Intel Power Gadget API. The desired behavior should look like that:

  • Before each method benchmark a new Thread, which will do the profiling part, should be created and started.
  • During the benchmark of the "main" thread, the profiler thread should read a sample until the JMH teardown method is executed.

My problem:

The code executes sequentially instead of the desired concurrent execution. The "profiling phase" should begin in the JMH Setup and end in the JMH Teardown method.

My code:

The code is still experimental and messy. Sorry for that. I still kept all of the debug print statements for logging purposes.

BenchmarkProgram.java:

package Profiling;

public class BenchmarkProgram {

    public static void main(String[] args) throws Exception{
        org.openjdk.jmh.Main.main(args);
    }
}

PowerGadgetProfiler.java:

package Profiling;


import BenchmarkContainers.BenchmarkContainer;

import java.lang.reflect.Method;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.Arrays;

public class PowerGadgetProfiler {

    static{
        System.loadLibrary("PowerGadgetProfiler");
        System.out.println("THREAD " + Thread.currentThread().getId() +  " LIBRARY LOADED SUCCESSFULLY");
    }

    public static int profileId = 0;
    static int methodCounter = 0;

    private static final String folderName = buildFolderName();

    public static final Object profilerLock = new Object();

    public static volatile boolean running = false;

    private static final PowerGadgetProfiler instance = new PowerGadgetProfiler();

    private static String[] getBenchmarkMethodsInOrder(){
        ArrayList<String> methodNames = new ArrayList<>();
        Method[] methods = BenchmarkContainer.class.getMethods();
        for (Method method : methods) {
            String name = method.toString();
            if(name.contains("BenchmarkContainer")
            && !name.contains("initializeProfilerThread")
            && !name.contains("stopProfiling")){
                methodNames.add(method.getName());
            }
        }
        String[] methodArray = methodNames.toArray(new String[0]);
        Arrays.sort(methodArray);
        System.out.println("THREAD " + Thread.currentThread().getId() + " METHODS: " + Arrays.toString(methodArray));
        return methodArray;
    }

    private static String buildFolderName(){
        LocalDateTime timeStamp = LocalDateTime.now();
        String result = "/Users/timerdar/BM-";          // change when on other machine
        result += Integer.toString(timeStamp.getYear());
        result += Integer.toString(timeStamp.getMonthValue());
        result += Integer.toString(timeStamp.getDayOfMonth());
        result += Integer.toString(timeStamp.getHour());
        result += Integer.toString(timeStamp.getMinute());
        return result;
    }

    private static String buildFileName(){
        return folderName + "/" + benchmarkMethods[methodCounter] + profileId + ".csv";
    }

    private static final String[] benchmarkMethods = getBenchmarkMethodsInOrder();

    private native void startProfiling(String fileName);

    private native void readSample();

    private native void stopProfiling();

    public static void startProfiling(){
        System.out.println("THREAD " + Thread.currentThread().getId() + " ENTERED START PROFILING");
        synchronized(profilerLock) {
            System.out.println("THREAD " + Thread.currentThread().getId() + " RELEASES PROFILER LOCK");
            profilerLock.notify();
        }
        final String fileName = buildFileName();
        instance.startProfiling(fileName);
        System.out.println("THREAD " + Thread.currentThread().getId() + " CALLED PROFILE EXECUTION");
        profileId++;
        while(running){
            getSample();
        }
        endProfiling();
    }

    private static void getSample(){
        instance.readSample();
    }

    private static void endProfiling(){
        System.out.println("THREAD " + Thread.currentThread().getId() + " ENTERED endProfiling()");
        instance.stopProfiling();
        System.out.println("JAVA THREAD " + Thread.currentThread().getId() + " CALLED NATIVE stopProfiling()");
        methodCounter++;
    }
}

BenchmarkContainer.java:

This is just a really small subset of all benchmarks I have. I reduced the number of benchmarks to reduce the overhead while creating a proof-of-concept.

package BenchmarkContainers;

import Profiling.PowerGadgetProfiler;
import Profiling.ProfilerThread;
import org.openjdk.jmh.annotations.*;

import java.util.concurrent.TimeUnit;

@State(Scope.Benchmark)
public class BenchmarkContainer {

    //@Param({"5", "30",  "210", "1470"})
    @Param({"1", "2"})
    public int upperBound;

    @Setup(Level.Trial)
    public void initializeProfilerThread() {
        System.out.println("THREAD " + Thread.currentThread().getId() + " ENTERED initializeProfilerThread()");
        ProfilerThread profilerThread = new ProfilerThread();
        profilerThread.start();
        synchronized (PowerGadgetProfiler.profilerLock){
            try {
                System.out.println("THREAD " + Thread.currentThread().getId() + " WAITS FOR PROFILER LOCK");
                PowerGadgetProfiler.profilerLock.wait();
                PowerGadgetProfiler.running = true;
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    @TearDown(Level.Trial)
    public void stopProfiling(){
        System.out.println("THREAD " + Thread.currentThread().getId() + " ENTERED TEARDOWN PROCEDURE");
        synchronized (PowerGadgetProfiler.profilerLock){
            PowerGadgetProfiler.running = false;
        }
    }

    @Benchmark
    @Fork(value = 1, warmups = 1)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public void init(){
        {

        }
    }

    @Benchmark
    @Fork(value = 1, warmups = 1)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int geoSum(){
        int sum = 0;
        for(int i = 0; i <= upperBound; i++){
            sum += i;
        }
        return sum;
    }
}

ProfilerThread.java:

package Profiling;

public class ProfilerThread extends Thread{

    @Override
    public void run() {
        System.out.println("STARTING PROFILING IN PROFILER THREAD");
        PowerGadgetProfiler.startProfiling();
    }
}

Profiling_PowerGadgetProfiler.c:

#include <jni.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <IntelPowerGadget/EnergyLib.h>
#include <pthread.h>

#include "Profiling_PowerGadgetProfiler.h"

JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_startProfiling
  (JNIEnv *env, jobject thisObject, jstring fileName)
  {
      uint64_t tid;
      pthread_threadid_np(NULL, &tid);
      printf("THREAD %llu CALLED NATIVE startProfiling()\n", tid);


      IntelEnergyLibInitialize();

      printf("NATIVE PROFILER INITIALIZED\n");
      char fileNameAsString[255];

      const char *fileNameAsConstString = (*env)->GetStringUTFChars(env, fileName, 0);

      strcpy(fileNameAsString, fileNameAsConstString);

      printf("RESULTING FILE NAME: %s\n", fileNameAsString);
      printf("STARTING PROFILING IN NATIVE CODE\n");

      StartLog(fileNameAsString);

  }


JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_readSample
  (JNIEnv *env , jobject thisObject)
{
    ReadSample();
}

JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_stopProfiling
 (JNIEnv* env, jobject thisObject)
{
    uint64_t tid;
    pthread_threadid_np(NULL, &tid);
    printf("THREAD %llu CALLED NATIVE stopProfiling()\n", tid);
    printf("ENDING PROFILING IN NATIVE CODE AND WRITE TO FILE\n");
    StopLog();
}

Console output:

Please don't worry about the mix of JMH and my outputs. This does not matter for me.

# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=62464:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: BenchmarkContainers.BenchmarkContainer.geoSum
# Parameters: (upperBound = 1)

# Run progress: 0,00% complete, ETA 00:13:20
# Warmup Fork: 1 of 1
# Warmup Iteration   1: THREAD 16 ENTERED initializeProfilerThread()
STARTING PROFILING IN PROFILER THREAD
THREAD 17 LIBRARY LOADED SUCCESSFULLY
THREAD 17 METHODS: [geoSum, init]
THREAD 16 WAITS FOR PROFILER LOCK
THREAD 17 ENTERED START PROFILING
THREAD 17 RELEASES PROFILER LOCK
THREAD 17 STARTING PROFILING IN PGP WITH PARAM VALUE: 1
THREAD 17 CALLED PROFILE EXECUTION
6,152 ns/op
# Warmup Iteration   2: 6,139 ns/op
# Warmup Iteration   3: 5,688 ns/op
# Warmup Iteration   4: 6,172 ns/op
# Warmup Iteration   5: 6,525 ns/op
Iteration   1: 6,858 ns/op
Iteration   2: 7,265 ns/op
Iteration   3: 7,523 ns/op
Iteration   4: 7,494 ns/op
Iteration   5: THREAD 16 ENTERED TEARDOWN PROCEDURE
7,776 ns/op
THREAD 17 ENTERED endProfiling()
THREAD 204730 CALLED NATIVE startProfiling()
NATIVE PROFILER INITIALIZED
RESULTING FILE NAME: /Users/timerdar/BM-202111301910/geoSum0.csv
STARTING PROFILING IN NATIVE CODE
THREAD 204730 CALLED NATIVE stopProfiling()
ENDING PROFILING IN NATIVE CODE AND WRITE TO FILE

What I've tried so far:

  • I removed the control flow of the profiling from the native code into the Java code. Before I had a static variable in my native code which indicated the running state of the benchmark. The ReadSample() method was called until this variable swapped (other function in the C file).

  • I removed the synchronized keyword from the functions in PowerGadgetProfiler startProfiling, getSample and endProfiling

Thank you for your patience and I hope you can help me.

Minimum necessary code example to reproduce the problem:

I have created a minimum working example to reduce the problem to it's main components.

The code:

BenchmarkProgram.java

package Profiling;

public class BenchmarkProgram {

    public static void main(String[] args) throws Exception{
        org.openjdk.jmh.Main.main(args);
    }
}

BenchmarkContainer.java

package BenchmarkContainers;

import Profiling.PowerGadgetProfiler;
import Profiling.ProfilerThread;
import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
public class BenchmarkContainer {

    @Param({"1", "2"})
    public int upperBound;

    Thread profilerThread;

    @Setup(Level.Trial)
    public void initializeProfilerThread() {
        profilerThread = new ProfilerThread();
        profilerThread.start();
        PowerGadgetProfiler.running = true;
    }

    @TearDown(Level.Trial)
    public void stopProfiling() {
        PowerGadgetProfiler.running = false;
        try {
            profilerThread.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

    }

    @Benchmark
    @Fork(0)
    public void init() {
        {

        }
    }

    @Benchmark
    @Fork(0)
    public int geoSum() {
        int sum = 0;
        for (int i = 0; i <= upperBound; i++) {
            sum += i;
        }
        return sum;
    }
}

PowerGadgetProfiler.java

package Profiling;

public class PowerGadgetProfiler {

    static {
        System.loadLibrary("PowerGadgetProfiler");
    }

    public static volatile boolean running = false;

    private static native void peep();

    public static void startProfiling() {
        peep();
        System.out.println("MOCK PEEP");
        while (running) {
            getSample();
        }
        endProfiling();
    }

    private static void getSample() {
        // mock sampling
    }

    private static void endProfiling() {
        peep();
        System.out.println("MOCK PEEP");
    }
}

ProfilerThread.java

package Profiling;

public class ProfilerThread extends Thread{

    @Override
    public void run() {
        System.out.println("STARTING PROFILING IN PROFILER THREAD");
        PowerGadgetProfiler.startProfiling();
    }
}

Profiling_PowerGadgetProfiler.c

#include <jni.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <IntelPowerGadget/EnergyLib.h>
#include <pthread.h>

#include "Profiling_PowerGadgetProfiler.h"

JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_peep
  (JNIEnv *env, jclass thisClass)
  {
    uint64_t tid;
    pthread_threadid_np(NULL, &tid);
    printf("THREAD %llu PEEP\n", tid);
  }

The output:

# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50458:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.geoSum
# Parameters: (upperBound = 1)

# Run progress: 0,00% complete, ETA 00:06:40
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration   1: STARTING PROFILING IN PROFILER THREAD
THREAD 14 LIBRARY LOADED SUCCESSFULLY
MOCK PEEP
351509573,446 ops/s
# Warmup Iteration   2: 368618416,430 ops/s
# Warmup Iteration   3: 318780225,551 ops/s
# Warmup Iteration   4: 316404930,396 ops/s
# Warmup Iteration   5: 308587428,671 ops/s
Iteration   1: 307622321,475 ops/s
Iteration   2: 295849314,405 ops/s
Iteration   3: 310904513,944 ops/s
Iteration   4: 298409293,008 ops/s
Iteration   5: MOCK PEEP
293730263,155 ops/s

Result "BenchmarkContainers.BenchmarkContainer.geoSum":
  301303141,197 ±(99.9%) 29045283,861 ops/s [Average]
  (min, avg, max) = (293730263,155, 301303141,197, 310904513,944), stdev = 7542967,981
  CI (99.9%): [272257857,337, 330348425,058] (assumes normal distribution)


# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50458:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.geoSum
# Parameters: (upperBound = 2)

# Run progress: 25,00% complete, ETA 00:05:01
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration   1: STARTING PROFILING IN PROFILER THREAD
MOCK PEEP
250420430,065 ops/s
# Warmup Iteration   2: 240512982,879 ops/s
# Warmup Iteration   3: 234267496,579 ops/s
# Warmup Iteration   4: 232677741,065 ops/s
# Warmup Iteration   5: 225373803,499 ops/s
Iteration   1: 225697770,075 ops/s
Iteration   2: 233106892,737 ops/s
Iteration   3: 230332046,340 ops/s
Iteration   4: 220403129,274 ops/s
Iteration   5: MOCK PEEP
222970983,188 ops/s

Result "BenchmarkContainers.BenchmarkContainer.geoSum":
  226502164,323 ±(99.9%) 20064492,546 ops/s [Average]
  (min, avg, max) = (220403129,274, 226502164,323, 233106892,737), stdev = 5210684,997
  CI (99.9%): [206437671,777, 246566656,869] (assumes normal distribution)


# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50458:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.init
# Parameters: (upperBound = 1)

# Run progress: 50,00% complete, ETA 00:03:20
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration   1: STARTING PROFILING IN PROFILER THREAD
MOCK PEEP
1285734648,964 ops/s
# Warmup Iteration   2: 1281687199,566 ops/s
# Warmup Iteration   3: 1248636633,953 ops/s
# Warmup Iteration   4: 1286229544,274 ops/s
# Warmup Iteration   5: 1242773248,106 ops/s
Iteration   1: 1249716889,546 ops/s
Iteration   2: 1163957364,456 ops/s
Iteration   3: 1165338083,544 ops/s
Iteration   4: 1175806686,372 ops/s
Iteration   5: MOCK PEEP
1206242463,819 ops/s

Result "BenchmarkContainers.BenchmarkContainer.init":
  1192212297,547 ±(99.9%) 140077359,758 ops/s [Average]
  (min, avg, max) = (1163957364,456, 1192212297,547, 1249716889,546), stdev = 36377645,494
  CI (99.9%): [1052134937,790, 1332289657,305] (assumes normal distribution)


# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50458:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.init
# Parameters: (upperBound = 2)

# Run progress: 75,00% complete, ETA 00:01:40
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration   1: STARTING PROFILING IN PROFILER THREAD
MOCK PEEP
1167710836,896 ops/s
# Warmup Iteration   2: 1206785568,628 ops/s
# Warmup Iteration   3: 1160517470,177 ops/s
# Warmup Iteration   4: 1192993000,343 ops/s
# Warmup Iteration   5: 1198394027,257 ops/s
Iteration   1: 1158210173,102 ops/s
Iteration   2: 1169432730,215 ops/s
Iteration   3: 1070907177,608 ops/s
Iteration   4: 336332917,114 ops/s
Iteration   5: MOCK PEEP
433452372,375 ops/s

Result "BenchmarkContainers.BenchmarkContainer.init":
  833667074,083 ±(99.9%) 1589834232,333 ops/s [Average]
  (min, avg, max) = (336332917,114, 833667074,083, 1169432730,215), stdev = 412874901,402
  CI (99.9%): [≈ 0, 2423501306,415] (assumes normal distribution)


# Run complete. Total time: 00:06:40

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                      (upperBound)   Mode  Cnt           Score            Error  Units
BenchmarkContainers.BenchmarkContainer.geoSum             1  thrpt    5   301303141,197 ±   29045283,861  ops/s
BenchmarkContainers.BenchmarkContainer.geoSum             2  thrpt    5   226502164,323 ±   20064492,546  ops/s
BenchmarkContainers.BenchmarkContainer.init               1  thrpt    5  1192212297,547 ±  140077359,758  ops/s
BenchmarkContainers.BenchmarkContainer.init               2  thrpt    5   833667074,083 ± 1589834232,333  ops/s
THREAD 62778 PEEP
THREAD 62778 PEEP
THREAD 63451 PEEP
THREAD 63451 PEEP
THREAD 63828 PEEP
THREAD 63828 PEEP
THREAD 64242 PEEP
THREAD 64242 PEEP

Process finished with exit code 0

Expected behavior: Usually I would expect the native code to execute first before the Java code continues running. But however, the program seems to queue up all code and then executes everything at once when the benchmark is finished.

As this Stackoverflow Thread says, the program usually should work as expected. But I am currently not able to identify the problem in my code.

The behavior I would expect from the program is described in the following diagram:

Expected program flow

I hope you are able to help me again.

Tim Erdar
  • 36
  • 3
  • Are you able to reduce and simplify your question and code to the minimum necessary to reproduce it? – drum Dec 01 '21 at 01:12
  • I don't know which parts of the code are safely removable to get a minimum working example because I don't know if the problem is in JMH, JNI or anywhere else. – Tim Erdar Dec 01 '21 at 13:20
  • I've provided a minimum necessary example below. – Tim Erdar Dec 06 '21 at 12:57

3 Answers3

1

I ran the example in my IDE (got rid of the native stuff). And it seems to be working correctly.

So first the setup is called, then the benchmark, and then the teardown. This is the expected sequential flow you normally want to have. So I don't see the problem.

I do see some concurrency problems btw.

        synchronized (PowerGadgetProfiler.profilerLock){
            try {
                System.out.println("THREAD " + Thread.currentThread().getId() + " WAITS FOR PROFILER LOCK");
                PowerGadgetProfiler.profilerLock.wait();
                PowerGadgetProfiler.running = true;
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }

The profilerLock wait can spuriously fail, so normally this needs to be called in a loop. Also normally you wait for a condition like profiler running; but in this case, you are setting the condition.

So I would expect something like this:

synchronized (PowerGadgetProfiler.profilerLock){
      while(!PowerGadgetProfiler.running){             
          PowerGadgetProfiler.profilerLock.wait();    
      }
}

And I would update PowerGadgetProfiler.running like this:

public static void startProfiling(){
    synchronized(profilerLock) {
        PowerGadgetProfiler.running=true;
        profilerLock.notifyAll();
    } 

Similar fixes should be applied when you wait for the termination of the profiler.

pveentjer
  • 10,545
  • 3
  • 23
  • 40
  • Thank you for your concurrency suggestions. Well, I'm not the most experienced developer in terms of concurrency/parallelism as you have seen. It didn't solve the main problem. So now I will try something like a "peep" method in the native C code which should be called at several points in the program flow to determine if the native code is in general able to be run during the benchmarks. If that doesn't work, then there is a problem with my understanding of the JNI program flow. – Tim Erdar Dec 01 '21 at 13:26
  • Remove the JNI calls and just use some stubs to make the problem as small as possible. If that works, it is likely a problem in the JNI code. – pveentjer Dec 01 '21 at 13:40
1

You cannot block in @Setup, as it would be called by the benchmark thread, which would never get to @Benchmark body:

    @Setup(Level.Trial)
    public void initializeProfilerThread() {
        ProfilerThread profilerThread = new ProfilerThread();
        profilerThread.start();
        synchronized (PowerGadgetProfiler.profilerLock){
            try {
                PowerGadgetProfiler.profilerLock.wait(); // <---- HERE
                PowerGadgetProfiler.running = true;
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

The only way this might work is by accident: in a multi-threaded benchmark, some thread would eventually get to notify this thread.

But all this looks unnecessary. JMH would guarantee only one thread is calling @Setup for @State(Benchmark), which means all this can be simplified to:

@State(Scope.Benchmark)
public class BenchmarkContainer {

    Thread profilerThread;

    @Setup(Level.Trial)
    public void startProfiling() {
        profilerThread = new ProfilerThread();
        profilerThread.start();
        profilerThread.running = true;
    }

    @TearDown(Level.Trial)
    public void stopProfiling(){
        profilerThread.running = false;
        profilerThread.join();
    }

If you want more answers, provide the MCVE.

Aleksey Shipilev
  • 18,599
  • 2
  • 67
  • 86
0

I have finally uncovered the mystery behind the ''strange'' behavior of my program. The truth is, that the program works perfectly fine, but the printf statements in the native C code are queued up and executed all at once at the end of all benchmarks.

What did I do to uncover that?

I added something like a back coupling (or feedback) mechanism to my native "peep" method. Instead of just printing something via printf, the method calls a function called backCoupledPeep in the Java code which does exactly the same like the "old" peep, but via System.out.println.

Important code fractions:

PowerGadgetProfiler.java

package Profiling;

...

public class PowerGadgetProfiler {

...

  public static void backCoupledPeep(){
    System.out.println("BACKCOUPLED PEEP TRIGGERED FROM NATIVE CODE");
  }

}

Profiling_PowerGadgetProfiler.c

...

JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_peep
  (JNIEnv *env, jclass thisClass)
  {
    uint64_t tid;
    pthread_threadid_np(NULL, &tid);

    jmethodID mid = (*env)->GetStaticMethodID(env, thisClass, "backCoupledPeep", "()V");

    if(mid != NULL)
    {
        (*env)->CallStaticVoidMethod(env, thisClass, mid);
    }

    printf("THREAD %llu NATIVE PEEP\n", tid);
  }

Console output:

# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=61242:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.geoSum
# Parameters: (upperBound = 1)

# Run progress: 0,00% complete, ETA 00:06:40
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration   1: STARTING PROFILING IN PROFILER THREAD
THREAD 14 LIBRARY LOADED SUCCESSFULLY
BACKCOUPLED PEEP TRIGGERED FROM NATIVE CODE <- correct/desired occurrence
MOCK PEEP
297967165,801 ops/s
# Warmup Iteration   2: 335669073,439 ops/s
# Warmup Iteration   3: 293700200,462 ops/s
# Warmup Iteration   4: 294345901,793 ops/s
# Warmup Iteration   5: 288535310,404 ops/s
Iteration   1: 291056885,946 ops/s
Iteration   2: 289043398,600 ops/s
Iteration   3: 290960426,836 ops/s
Iteration   4: 295573154,239 ops/s
Iteration   5: BACKCOUPLED PEEP TRIGGERED FROM NATIVE CODE <- correct/desired occurrence
MOCK PEEP
295093250,001 ops/s

Result "BenchmarkContainers.BenchmarkContainer.geoSum":
  292345423,124 ±(99.9%) 10967697,488 ops/s [Average]
  (min, avg, max) = (289043398,600, 292345423,124, 295573154,239), stdev = 2848276,208
  CI (99.9%): [281377725,636, 303313120,612] (assumes normal distribution)

...
///
/// all other benchmarks
///
...

/// "queued up" printf's
THREAD 277532 NATIVE PEEP
THREAD 277532 NATIVE PEEP
THREAD 278012 NATIVE PEEP
THREAD 278012 NATIVE PEEP
THREAD 279212 NATIVE PEEP
THREAD 279212 NATIVE PEEP
THREAD 279758 NATIVE PEEP
THREAD 279758 NATIVE PEEP

Process finished with exit code 0

Tim Erdar
  • 36
  • 3