I'm processing a batch of 100 objects splitting them in partitions of 10, and each partition is sent to a separate thread to be processed in parallel. This is the current code for that:
1. var itemsToSave = new ConcurrentLinkedQueue<ItemToSave>();
2
3. Lists.partition(originalList, 10)
4. .parallelStream()
5. .forEach(partitionedList -> process(partitionedList, itemsToSave));
My understanding is that
line1: creates a thread safe list to add the individual items once they are processed
line3: will return a number of lists, each of them with 10 entries from originalList
line4 will spawn a new thread for each of the lists created on line3
line5: for each list in its own thread, start the process that is supposed to be parallel.
Please correct me if my understanding is wrong, but nevertheless this code is working as expected: prior to parallelization 75 seconds to process, after parallelization 20 seconds. Looks good.
But after a while I noticed that the process is now taking literally ~1ms
or ven zero according to Kibana. That is because the new code processes the items so fast that the database doesn't have time to reach 100 items to be processed, and therefore the batch that should be of 100 is now less than 10.
In that sense, line 3 of the code shown before will return one single list, after all a partition of 10 items from a list with less than 10 items will be a single list.
Then, this single list is sent to parallelStream()
. Here I where my question is: does parallelStream()
still spawn a new thread to process one single list? Or does it only spawn threads when inputted with more than one list? Because (to me) it doesn't make much sense to open a new thread to process one single batch of items... that could happen sequentially and reduce the overhead of spawning a thread and etc.
so: How many threads parallelStream() creates when given only one list as input?
sorry for the long question, but I felt like a had to explain my thoughts