I noticed that if I use StreamEx lib to parallel out my streams with a custom ForkJoinPool as below - the subsequent actions do run in parallel threads from that pool. However, if I add a map() operation and parallel the resulting stream - only one thread from the pool is used.
Below is the full code (without all imports) of a minimal working example that demonstrates this problem. The only difference between the executeAsParallelFromList() and the executeAsParallelAfterMap() methods is addition of .map(...) call before the .parallel() .
import one.util.streamex.StreamEx;
public class ParallelExample {
private static final Logger logger = LoggerFactory.getLogger(ParallelExample.class);
private static ForkJoinPool s3ThreadPool = new ForkJoinPool(3);
public static List<String> getTestList(){
int listSize = 10;
List<String> testList = new ArrayList<>();
for (int i=0; i<listSize; i++)
testList.add("item_" + i);
return testList;
}
public static void executeAsParallelFromList(){
logger.info("executeAsParallelFromList():");
List<String> testList = getTestList();
StreamEx<String> streamOfItems = StreamEx
.of(testList)
.parallel(s3ThreadPool);
logger.info("streamOfItems.isParallel(): {}", streamOfItems.isParallel());
streamOfItems.forEach(item -> handleItem(item));
}
public static void executeAsParallelAfterMap(){
logger.info("executeAsParallelAfterMap():");
List<String> testList = getTestList();
StreamEx<String> streamOfItems = StreamEx
.of(testList)
.map(item -> item+"_mapped")
.parallel(s3ThreadPool);
logger.info("streamOfItems.isParallel(): {}", streamOfItems.isParallel());
streamOfItems.forEach(item -> handleItem(item));
}
private static void handleItem(String item){
// do something with the item - just print for now
logger.info("I'm handling item: {}", item);
}
}
Unit test to execute both methods:
public class ParallelExampleTest {
@Test
public void testExecuteAsParallelFromList() {
ParallelExample.executeAsParallelFromList();
}
@Test
public void testExecuteAsParallelFromStreamEx() {
ParallelExample.executeAsParallelAfterMap();
}
}
Results of the execution:
08:49:12.992 [main] INFO marina.streams.ParallelExample - executeAsParallelFromList():
08:49:13.002 [main] INFO marina.streams.ParallelExample - streamOfItems.isParallel(): true
08:49:13.040 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_6
08:49:13.040 [ForkJoinPool-1-worker-2] INFO marina.streams.ParallelExample - I'm handling item: item_2
08:49:13.040 [ForkJoinPool-1-worker-3] INFO marina.streams.ParallelExample - I'm handling item: item_1
08:49:13.041 [ForkJoinPool-1-worker-2] INFO marina.streams.ParallelExample - I'm handling item: item_4
08:49:13.041 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_8
08:49:13.041 [ForkJoinPool-1-worker-3] INFO marina.streams.ParallelExample - I'm handling item: item_0
08:49:13.041 [ForkJoinPool-1-worker-2] INFO marina.streams.ParallelExample - I'm handling item: item_3
08:49:13.041 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_9
08:49:13.041 [ForkJoinPool-1-worker-3] INFO marina.streams.ParallelExample - I'm handling item: item_5
08:49:13.041 [ForkJoinPool-1-worker-2] INFO marina.streams.ParallelExample - I'm handling item: item_7
08:49:13.043 [main] INFO marina.streams.ParallelExample - executeAsParallelAfterMap():
08:49:13.043 [main] INFO marina.streams.ParallelExample - streamOfItems.isParallel(): true
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_0_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_1_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_2_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_3_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_4_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_5_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_6_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_7_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_8_mapped
08:49:13.044 [ForkJoinPool-1-worker-1] INFO marina.streams.ParallelExample - I'm handling item: item_9_mapped
As you can see, all three threads are being used when executing the executeAsParallelFromList(), but only one thread is used when executing the executeAsParallelAfterMap().
Why?
thanks!
Marina
NOTE: the example is deliberately simplistic - I tried to make it as minimal as possible to demo the issue. Obviously in real life there is much more going on in the map(), handleItem() , etc., and the input data is much more interesting (I'm trying to process AWS S3 buckets/prefixes in parallel).