Original problem (with full code)
(I kept the original full code alive because the answers below are related to this full code.)
I am currently working on my master thesis project about energy-efficient software engineering where I want to (coarsely) monitor the energy consumption behavior of different Java methods.
Short description of the desired behavior:
I have a set of JMH benchmarks that I want to profile with the Intel Power Gadget API. The desired behavior should look like that:
- Before each method benchmark a new Thread, which will do the profiling part, should be created and started.
- During the benchmark of the "main" thread, the profiler thread should read a sample until the JMH teardown method is executed.
My problem:
The code executes sequentially instead of the desired concurrent execution. The "profiling phase" should begin in the JMH Setup and end in the JMH Teardown method.
My code:
The code is still experimental and messy. Sorry for that. I still kept all of the debug print statements for logging purposes.
BenchmarkProgram.java:
package Profiling;
public class BenchmarkProgram {
public static void main(String[] args) throws Exception{
org.openjdk.jmh.Main.main(args);
}
}
PowerGadgetProfiler.java:
package Profiling;
import BenchmarkContainers.BenchmarkContainer;
import java.lang.reflect.Method;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.Arrays;
public class PowerGadgetProfiler {
static{
System.loadLibrary("PowerGadgetProfiler");
System.out.println("THREAD " + Thread.currentThread().getId() + " LIBRARY LOADED SUCCESSFULLY");
}
public static int profileId = 0;
static int methodCounter = 0;
private static final String folderName = buildFolderName();
public static final Object profilerLock = new Object();
public static volatile boolean running = false;
private static final PowerGadgetProfiler instance = new PowerGadgetProfiler();
private static String[] getBenchmarkMethodsInOrder(){
ArrayList<String> methodNames = new ArrayList<>();
Method[] methods = BenchmarkContainer.class.getMethods();
for (Method method : methods) {
String name = method.toString();
if(name.contains("BenchmarkContainer")
&& !name.contains("initializeProfilerThread")
&& !name.contains("stopProfiling")){
methodNames.add(method.getName());
}
}
String[] methodArray = methodNames.toArray(new String[0]);
Arrays.sort(methodArray);
System.out.println("THREAD " + Thread.currentThread().getId() + " METHODS: " + Arrays.toString(methodArray));
return methodArray;
}
private static String buildFolderName(){
LocalDateTime timeStamp = LocalDateTime.now();
String result = "/Users/timerdar/BM-"; // change when on other machine
result += Integer.toString(timeStamp.getYear());
result += Integer.toString(timeStamp.getMonthValue());
result += Integer.toString(timeStamp.getDayOfMonth());
result += Integer.toString(timeStamp.getHour());
result += Integer.toString(timeStamp.getMinute());
return result;
}
private static String buildFileName(){
return folderName + "/" + benchmarkMethods[methodCounter] + profileId + ".csv";
}
private static final String[] benchmarkMethods = getBenchmarkMethodsInOrder();
private native void startProfiling(String fileName);
private native void readSample();
private native void stopProfiling();
public static void startProfiling(){
System.out.println("THREAD " + Thread.currentThread().getId() + " ENTERED START PROFILING");
synchronized(profilerLock) {
System.out.println("THREAD " + Thread.currentThread().getId() + " RELEASES PROFILER LOCK");
profilerLock.notify();
}
final String fileName = buildFileName();
instance.startProfiling(fileName);
System.out.println("THREAD " + Thread.currentThread().getId() + " CALLED PROFILE EXECUTION");
profileId++;
while(running){
getSample();
}
endProfiling();
}
private static void getSample(){
instance.readSample();
}
private static void endProfiling(){
System.out.println("THREAD " + Thread.currentThread().getId() + " ENTERED endProfiling()");
instance.stopProfiling();
System.out.println("JAVA THREAD " + Thread.currentThread().getId() + " CALLED NATIVE stopProfiling()");
methodCounter++;
}
}
BenchmarkContainer.java:
This is just a really small subset of all benchmarks I have. I reduced the number of benchmarks to reduce the overhead while creating a proof-of-concept.
package BenchmarkContainers;
import Profiling.PowerGadgetProfiler;
import Profiling.ProfilerThread;
import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;
@State(Scope.Benchmark)
public class BenchmarkContainer {
//@Param({"5", "30", "210", "1470"})
@Param({"1", "2"})
public int upperBound;
@Setup(Level.Trial)
public void initializeProfilerThread() {
System.out.println("THREAD " + Thread.currentThread().getId() + " ENTERED initializeProfilerThread()");
ProfilerThread profilerThread = new ProfilerThread();
profilerThread.start();
synchronized (PowerGadgetProfiler.profilerLock){
try {
System.out.println("THREAD " + Thread.currentThread().getId() + " WAITS FOR PROFILER LOCK");
PowerGadgetProfiler.profilerLock.wait();
PowerGadgetProfiler.running = true;
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
@TearDown(Level.Trial)
public void stopProfiling(){
System.out.println("THREAD " + Thread.currentThread().getId() + " ENTERED TEARDOWN PROCEDURE");
synchronized (PowerGadgetProfiler.profilerLock){
PowerGadgetProfiler.running = false;
}
}
@Benchmark
@Fork(value = 1, warmups = 1)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
public void init(){
{
}
}
@Benchmark
@Fork(value = 1, warmups = 1)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
public int geoSum(){
int sum = 0;
for(int i = 0; i <= upperBound; i++){
sum += i;
}
return sum;
}
}
ProfilerThread.java:
package Profiling;
public class ProfilerThread extends Thread{
@Override
public void run() {
System.out.println("STARTING PROFILING IN PROFILER THREAD");
PowerGadgetProfiler.startProfiling();
}
}
Profiling_PowerGadgetProfiler.c:
#include <jni.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <IntelPowerGadget/EnergyLib.h>
#include <pthread.h>
#include "Profiling_PowerGadgetProfiler.h"
JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_startProfiling
(JNIEnv *env, jobject thisObject, jstring fileName)
{
uint64_t tid;
pthread_threadid_np(NULL, &tid);
printf("THREAD %llu CALLED NATIVE startProfiling()\n", tid);
IntelEnergyLibInitialize();
printf("NATIVE PROFILER INITIALIZED\n");
char fileNameAsString[255];
const char *fileNameAsConstString = (*env)->GetStringUTFChars(env, fileName, 0);
strcpy(fileNameAsString, fileNameAsConstString);
printf("RESULTING FILE NAME: %s\n", fileNameAsString);
printf("STARTING PROFILING IN NATIVE CODE\n");
StartLog(fileNameAsString);
}
JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_readSample
(JNIEnv *env , jobject thisObject)
{
ReadSample();
}
JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_stopProfiling
(JNIEnv* env, jobject thisObject)
{
uint64_t tid;
pthread_threadid_np(NULL, &tid);
printf("THREAD %llu CALLED NATIVE stopProfiling()\n", tid);
printf("ENDING PROFILING IN NATIVE CODE AND WRITE TO FILE\n");
StopLog();
}
Console output:
Please don't worry about the mix of JMH and my outputs. This does not matter for me.
# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=62464:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: BenchmarkContainers.BenchmarkContainer.geoSum
# Parameters: (upperBound = 1)
# Run progress: 0,00% complete, ETA 00:13:20
# Warmup Fork: 1 of 1
# Warmup Iteration 1: THREAD 16 ENTERED initializeProfilerThread()
STARTING PROFILING IN PROFILER THREAD
THREAD 17 LIBRARY LOADED SUCCESSFULLY
THREAD 17 METHODS: [geoSum, init]
THREAD 16 WAITS FOR PROFILER LOCK
THREAD 17 ENTERED START PROFILING
THREAD 17 RELEASES PROFILER LOCK
THREAD 17 STARTING PROFILING IN PGP WITH PARAM VALUE: 1
THREAD 17 CALLED PROFILE EXECUTION
6,152 ns/op
# Warmup Iteration 2: 6,139 ns/op
# Warmup Iteration 3: 5,688 ns/op
# Warmup Iteration 4: 6,172 ns/op
# Warmup Iteration 5: 6,525 ns/op
Iteration 1: 6,858 ns/op
Iteration 2: 7,265 ns/op
Iteration 3: 7,523 ns/op
Iteration 4: 7,494 ns/op
Iteration 5: THREAD 16 ENTERED TEARDOWN PROCEDURE
7,776 ns/op
THREAD 17 ENTERED endProfiling()
THREAD 204730 CALLED NATIVE startProfiling()
NATIVE PROFILER INITIALIZED
RESULTING FILE NAME: /Users/timerdar/BM-202111301910/geoSum0.csv
STARTING PROFILING IN NATIVE CODE
THREAD 204730 CALLED NATIVE stopProfiling()
ENDING PROFILING IN NATIVE CODE AND WRITE TO FILE
What I've tried so far:
I removed the control flow of the profiling from the native code into the Java code. Before I had a static variable in my native code which indicated the running state of the benchmark. The
ReadSample()
method was called until this variable swapped (other function in the C file).I removed the
synchronized
keyword from the functions inPowerGadgetProfiler
startProfiling
,getSample
andendProfiling
Thank you for your patience and I hope you can help me.
Minimum necessary code example to reproduce the problem:
I have created a minimum working example to reduce the problem to it's main components.
The code:
BenchmarkProgram.java
package Profiling;
public class BenchmarkProgram {
public static void main(String[] args) throws Exception{
org.openjdk.jmh.Main.main(args);
}
}
BenchmarkContainer.java
package BenchmarkContainers;
import Profiling.PowerGadgetProfiler;
import Profiling.ProfilerThread;
import org.openjdk.jmh.annotations.*;
@State(Scope.Benchmark)
public class BenchmarkContainer {
@Param({"1", "2"})
public int upperBound;
Thread profilerThread;
@Setup(Level.Trial)
public void initializeProfilerThread() {
profilerThread = new ProfilerThread();
profilerThread.start();
PowerGadgetProfiler.running = true;
}
@TearDown(Level.Trial)
public void stopProfiling() {
PowerGadgetProfiler.running = false;
try {
profilerThread.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Benchmark
@Fork(0)
public void init() {
{
}
}
@Benchmark
@Fork(0)
public int geoSum() {
int sum = 0;
for (int i = 0; i <= upperBound; i++) {
sum += i;
}
return sum;
}
}
PowerGadgetProfiler.java
package Profiling;
public class PowerGadgetProfiler {
static {
System.loadLibrary("PowerGadgetProfiler");
}
public static volatile boolean running = false;
private static native void peep();
public static void startProfiling() {
peep();
System.out.println("MOCK PEEP");
while (running) {
getSample();
}
endProfiling();
}
private static void getSample() {
// mock sampling
}
private static void endProfiling() {
peep();
System.out.println("MOCK PEEP");
}
}
ProfilerThread.java
package Profiling;
public class ProfilerThread extends Thread{
@Override
public void run() {
System.out.println("STARTING PROFILING IN PROFILER THREAD");
PowerGadgetProfiler.startProfiling();
}
}
Profiling_PowerGadgetProfiler.c
#include <jni.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <IntelPowerGadget/EnergyLib.h>
#include <pthread.h>
#include "Profiling_PowerGadgetProfiler.h"
JNIEXPORT void JNICALL Java_Profiling_PowerGadgetProfiler_peep
(JNIEnv *env, jclass thisClass)
{
uint64_t tid;
pthread_threadid_np(NULL, &tid);
printf("THREAD %llu PEEP\n", tid);
}
The output:
# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50458:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.geoSum
# Parameters: (upperBound = 1)
# Run progress: 0,00% complete, ETA 00:06:40
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration 1: STARTING PROFILING IN PROFILER THREAD
THREAD 14 LIBRARY LOADED SUCCESSFULLY
MOCK PEEP
351509573,446 ops/s
# Warmup Iteration 2: 368618416,430 ops/s
# Warmup Iteration 3: 318780225,551 ops/s
# Warmup Iteration 4: 316404930,396 ops/s
# Warmup Iteration 5: 308587428,671 ops/s
Iteration 1: 307622321,475 ops/s
Iteration 2: 295849314,405 ops/s
Iteration 3: 310904513,944 ops/s
Iteration 4: 298409293,008 ops/s
Iteration 5: MOCK PEEP
293730263,155 ops/s
Result "BenchmarkContainers.BenchmarkContainer.geoSum":
301303141,197 ±(99.9%) 29045283,861 ops/s [Average]
(min, avg, max) = (293730263,155, 301303141,197, 310904513,944), stdev = 7542967,981
CI (99.9%): [272257857,337, 330348425,058] (assumes normal distribution)
# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50458:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.geoSum
# Parameters: (upperBound = 2)
# Run progress: 25,00% complete, ETA 00:05:01
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration 1: STARTING PROFILING IN PROFILER THREAD
MOCK PEEP
250420430,065 ops/s
# Warmup Iteration 2: 240512982,879 ops/s
# Warmup Iteration 3: 234267496,579 ops/s
# Warmup Iteration 4: 232677741,065 ops/s
# Warmup Iteration 5: 225373803,499 ops/s
Iteration 1: 225697770,075 ops/s
Iteration 2: 233106892,737 ops/s
Iteration 3: 230332046,340 ops/s
Iteration 4: 220403129,274 ops/s
Iteration 5: MOCK PEEP
222970983,188 ops/s
Result "BenchmarkContainers.BenchmarkContainer.geoSum":
226502164,323 ±(99.9%) 20064492,546 ops/s [Average]
(min, avg, max) = (220403129,274, 226502164,323, 233106892,737), stdev = 5210684,997
CI (99.9%): [206437671,777, 246566656,869] (assumes normal distribution)
# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50458:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.init
# Parameters: (upperBound = 1)
# Run progress: 50,00% complete, ETA 00:03:20
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration 1: STARTING PROFILING IN PROFILER THREAD
MOCK PEEP
1285734648,964 ops/s
# Warmup Iteration 2: 1281687199,566 ops/s
# Warmup Iteration 3: 1248636633,953 ops/s
# Warmup Iteration 4: 1286229544,274 ops/s
# Warmup Iteration 5: 1242773248,106 ops/s
Iteration 1: 1249716889,546 ops/s
Iteration 2: 1163957364,456 ops/s
Iteration 3: 1165338083,544 ops/s
Iteration 4: 1175806686,372 ops/s
Iteration 5: MOCK PEEP
1206242463,819 ops/s
Result "BenchmarkContainers.BenchmarkContainer.init":
1192212297,547 ±(99.9%) 140077359,758 ops/s [Average]
(min, avg, max) = (1163957364,456, 1192212297,547, 1249716889,546), stdev = 36377645,494
CI (99.9%): [1052134937,790, 1332289657,305] (assumes normal distribution)
# JMH version: 1.32
# VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
# VM invoker: /Users/timerdar/Library/Java/JavaVirtualMachines/openjdk-14.0.1/Contents/Home/bin/java
# VM options: -Djava.library.path=/Users/timerdar/IdeaProjects/microbenchmark_test/src/main/java/Profiling -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=50458:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: BenchmarkContainers.BenchmarkContainer.init
# Parameters: (upperBound = 2)
# Run progress: 75,00% complete, ETA 00:01:40
# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
# Warmup Iteration 1: STARTING PROFILING IN PROFILER THREAD
MOCK PEEP
1167710836,896 ops/s
# Warmup Iteration 2: 1206785568,628 ops/s
# Warmup Iteration 3: 1160517470,177 ops/s
# Warmup Iteration 4: 1192993000,343 ops/s
# Warmup Iteration 5: 1198394027,257 ops/s
Iteration 1: 1158210173,102 ops/s
Iteration 2: 1169432730,215 ops/s
Iteration 3: 1070907177,608 ops/s
Iteration 4: 336332917,114 ops/s
Iteration 5: MOCK PEEP
433452372,375 ops/s
Result "BenchmarkContainers.BenchmarkContainer.init":
833667074,083 ±(99.9%) 1589834232,333 ops/s [Average]
(min, avg, max) = (336332917,114, 833667074,083, 1169432730,215), stdev = 412874901,402
CI (99.9%): [≈ 0, 2423501306,415] (assumes normal distribution)
# Run complete. Total time: 00:06:40
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark (upperBound) Mode Cnt Score Error Units
BenchmarkContainers.BenchmarkContainer.geoSum 1 thrpt 5 301303141,197 ± 29045283,861 ops/s
BenchmarkContainers.BenchmarkContainer.geoSum 2 thrpt 5 226502164,323 ± 20064492,546 ops/s
BenchmarkContainers.BenchmarkContainer.init 1 thrpt 5 1192212297,547 ± 140077359,758 ops/s
BenchmarkContainers.BenchmarkContainer.init 2 thrpt 5 833667074,083 ± 1589834232,333 ops/s
THREAD 62778 PEEP
THREAD 62778 PEEP
THREAD 63451 PEEP
THREAD 63451 PEEP
THREAD 63828 PEEP
THREAD 63828 PEEP
THREAD 64242 PEEP
THREAD 64242 PEEP
Process finished with exit code 0
Expected behavior: Usually I would expect the native code to execute first before the Java code continues running. But however, the program seems to queue up all code and then executes everything at once when the benchmark is finished.
As this Stackoverflow Thread says, the program usually should work as expected. But I am currently not able to identify the problem in my code.
The behavior I would expect from the program is described in the following diagram:
I hope you are able to help me again.