I have a program which would benefit greatly (large data set with a somewhat involved mapping and filtering scheme, but not dependent on exterior variables / synchronization) from using parallelStream()
, but must be collected as a set.
I'm somewhat new to parallel-streams (this being my first experience with them), and tried using the below code only to discover that this led to concurrent modification of a non-concurrent backend and a deadlock condition behind the scenes.
This mapping attempts to get the filesize of an unmounted disk using the Linux native command sudo blockdev --getsize64 unmounted_device_here
(I have no idea if Java can get the full size of an unmounted disk on Linux, so I'm just using the native method since this will only ever be published on Linux systems anyway)
Mapping Method (which deadlocks):
var mountPath = Paths.get("/dev");
//Do NVME Drives First
var list = new ArrayList<Path>(10);
//Looks like nvme1n1
//For reasons beyond my understanding replacing [0-9] with \\d does not work here
try (var directoryStream = Files.newDirectoryStream(mountPath, "nvme[0-9]n[0-9]")) {
for (Path path : directoryStream) {
list.add(path);
}
}
//Map to DrivePacket (path, long), note that blockdev return bytes -> GB
var nvmePackets = list.parallelStream().map((drive) -> new DrivePacket(drive,
(Long.parseLong(runCommand("sudo", "blockdev", "--getsize64", drive.toAbsolutePath().toString())) / (1024 * 1024 * 1024))))
.collect(Collectors.toSet());
IOUtils is from the Apache utility classes:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.12.0</version>
</dependency>
runCommand (executes the native calls):
public static String runCommand(String... command) {
try {
if (DEBUG_MODE) {
systemMessage("Running Command: " + Arrays.asList(command).stream().collect(Collectors.joining(" ")));
}
var builder = new ProcessBuilder(command);
var result = IOUtils.toString(builder.start().getInputStream(), StandardCharsets.UTF_8).replaceAll("\n", "");
if (DEBUG_MODE) {
System.out.println("Result: " + result);
}
return result;
} catch (IOException ex) {
throw new IllegalStateException(ex);
}
}
The DrivePacket Class:
/**
* A record of the relevant information for a drive
*
* Path is the fully qualified /dev/DRIVE path
*/
public record DrivePacket(Path drivePath, long driveSize) {}
Since the operation benefits from concurrency, is there any way to do this using parallelStream
? Or do I have to use another technique?
It always hangs at the stop where this line of code is executed, and waits forever at ForkJoinTask.java
at externalAwaitDone();
when I use the debugger.
Unfortunately, there isn't a Set
analog of toConcurrentMap()
that I could find.
How can I avoid this deadlock while still getting both the parallelism granted for the computation and having the end result be a set?
System: JDK 16
EDIT 0: Updated the Code for Reproducability
Given that the mapping code calls subroutines that don't share data, I'm not sure why this would cause a deadlock condition.