2

I have a list of millions of distinct sets (no two sets have the same values), where I have to find all partitions for every single set.

For instance the set {A, B, C} can be partitioned to the following subsets:

{ {A}, {B}, {C} },
{ {A, B}, {C} },
{ {A, C}, {B} },
{ {A}, {B, C} },
{ {A, B, C} }

The total number of possible partitions is known as the Bell number.

To find all partitions of a set we can use the following Java code for instance.

However, since I have millions of sets, let's say 7 million set of size 6, where the total number of partitions for a set of size 6 is known to be equal to 203. Hence, the huge number of partitions to be generated (1421000000 for all 7 million sets).

I have a multiple cores on my PC, so I tried to generate the partitions of sets in parallel using the fork/join framework in Java. Not the actual partitioning is done in parallel but multiple set are partitioned in the same time on multiple cores.

However, this actually did not help to generate all partitions in a pre-defined time limit. For instance for 7,159,265 sets with 30 minutes time limit:

  • 3,599,921 were partitioned in a sequential stream.
  • 3,511,319 were partitioned using the below fork/join pool using 12 processors in parallel.

This is the actual code used,

First I inialize the fork/join framework.

ForkJoinPool pool = new ForkJoinPool(Runtime.getRuntime().availableProcessors());

I used the RecursiveAction class to partition each set on a core. I inialize the ParitionsParallel class that extends RecursiveAction.

ParitionsParallel tasks = new ParitionsParallel();
pool.invoke(tasks); // invoke that goes into the compute function in class PartitionsParallel

I put every set in an independant list. If the list size is greater than 1, I create the lists. In createsubtasks() I put every set in an independent list.

protected void compute() {
            if(list.size() <= 1) {
                //do partition on the set
                try {                   
                    DoParition();
                    
                } catch (Exception e) {
                    e.printStackTrace();
                } 
            }else {
                List<ParitionsParallel> subtasks = createsubtasks();    
                invokeAll(subtasks);
            }   
    }

I create ParitionsParallel instances where every instance contains one set. The allSets list is already saved in memory and can be called inside the class.

private List<ParitionsParallel> createsubtasks(){
        
        List<ParitionsParallel> subtasks = new ArrayList<>();
        
        for(Set s: allSets) {//allSets is saved in memory already
            ParitionsParallel pp = new ParitionsParallel();
            List<Set> list = new ArrayList<>(1); //size one = 1 set
            list.add(c);
            pp.setSet(list); //set every list for every ParitionsParallel instance 
            subtasks.add(pp);       
        }
        return subtasks;
    }

My questions are: how can I improve this, or is there any way to generate the actual partitionning in parallel on multiple cores.

Thank you.

CHE
  • 41
  • 3
  • Might be a silly question. I am only beginning to learn ForkJoinPools. But does each task in your pool only partition one set. If so, how large is that set on average? You say there are millions of sets. Since there are many sets that need to be partitioned, the the speedier way to accomplish this might be divide the sets into availableProcessors number of groups, then have each thread sequentially go through group and partition the sets. Partitioning a small set might be so fast that it doesn't lend itself to be faster w recursive threading (bc of the time to create/destroy threads) – Rachel Joyce Feb 27 '23 at 04:58

0 Answers0