processing large files in parallel using java

Question

I am attempting to process ten files, with some being smaller and others containing up to three million records. I have grouped the files into five groups based on logical dependencies, with variable file counts in each group. Group 1 has one file, group 2 has two files, group 3 has four files, group 4 has two files, and group 5 has one file. These files within each group can be run independently of each other.

I am using JDK 11 and tried using parallelstream() for each group, but it seems that the files are not being processed in parallel. Instead, they are being processed one after the other.

// files added to respective file group
FileGroup group1 = new FileGroup();
group1.add(file1);
group1.add(file2);
group1.add(file3);

.. and so on for each group

// groups added to following 'groups' collection
List<FileGroup> groups = new ArrayList(); 
groups.add(group1);
.
.
.
groups.add(group5);

// use parallel stream over groups of files. 
groups.parallelStream().forEach(group -> {...} )

if a group has four files, four parallel threads would be spawn. each thread will get one file which may have 3 million records to process.

What is the best way to execute this model?

How did you determine that they are processed one after another? — Alex R, Jun 15 '23 at 18:39

processing large files in parallel using java

0 Answers0