Mulithreading Usage

Question

I am iterating through a HashMap with +- 20 Million entries. In each iteration I am again iterating through HashMap with +- 20 Million entries.

 HashMap<String, BitSet> data_1 = new HashMap<String, BitSet>
HashMap<String, BitSet> data_2 = new HashMap<String, BitSet>

I am dividng data_1 into chunks based on number of threads(threads = cores, i have four core processor).

My code is taking more than 20 Hrs to excute. Excluding not storing the results into a file.

1) If i want to store the results of each thread without overlapping into a file, How can i do that?.

2) How can i make the following much faster.

3) How to create the chunks dynamically, based on number of cores?

  int cores = Runtime.getRuntime().availableProcessors();
  int threads = cores;

  //Number of threads
  int Chunks = data_1.size() / threads;


      //I don't trust with chunks created by the below line, that's why i created chunk1, chunk2, chunk3, chunk4 seperately and validated them.
      Map<Integer, BitSet>[] Chunk= (Map<Integer, BitSet>[]) new HashMap<?,?>[threads];

4) How to create threads using for loops? Is it correct what i am doing?

ClassName thread1 = new ClassName(data2, chunk1);
ClassName thread2 = new ClassName(data2, chunk2);
ClassName thread3 = new ClassName(data2, chunk3);
ClassName thread4 = new ClassName(data2, chunk4);

 thread1.start();
 thread2.start();
 thread3.start();
 thread4.start();

 thread1.join();
 thread2.join();
 thread3.join();
 thread4.join();

Representation of My Code

Public class ClassName {
Integer nSimilarEntities = 30;

    public void run() {


            for (String kNonRepeater : data_1.keySet()) {

                    // Extract the feature vector
                      BitSet vFeaturesNonRepeater = data_1.get(kNonRepeater);


                    // Calculate the sum of 1s (L2 norm is the sqrt of this)
                    double nNormNonRepeater = Math.sqrt(vFeaturesNonRepeater.cardinality());

            // Loop through the repeater set
                    double nMinSimilarity = 100;
                    int nMinSimIndex = 0;

                    // Maintain the list of top similar repeaters and the similarity values


                    long dpind = 0;
                    ArrayList<String> vSimilarKeys = new ArrayList<String>();
                    ArrayList<Double> vSimilarValues = new ArrayList<Double>();

                    for (String kRepeater : data_2.keySet()) {
                        // Status output at regular intervals
                        dpind++;
                        if (Math.floorMod(dpind, pct) == 0) {
                            System.out.println(dpind + " dot products (" + Math.round(dpind / pct) + "%) out of "
                                    + nNumSimilaritiesToCompute + " completed!");
                        }

                        // Calculate the norm of repeater, and the dot product

                        BitSet vFeaturesRepeater = data_2.get(kRepeater);


                        double nNormRepeater = Math.sqrt(vFeaturesRepeater.cardinality());
                        BitSet vTemp = (BitSet) vFeaturesNonRepeater.clone();
                        vTemp.and(vFeaturesRepeater);
                        double nCosineDistance = vTemp.cardinality() / (nNormNonRepeater * nNormRepeater);



                    //  queue.add(new MyClass(kRepeater,kNonRepeater,nCosineDistance));

                    //  if(queue.size() > YOUR_LIMIT)
                    //          queue.remove();

                        // Don't bother if the similarity is 0, obviously
                        if ((vSimilarKeys.size() < nSimilarEntities) && (nCosineDistance > 0)) {

                            vSimilarKeys.add(kRepeater);
                            vSimilarValues.add(nCosineDistance);

                            nMinSimilarity = vSimilarValues.get(0);
                            nMinSimIndex = 0;
                            for (int j = 0; j < vSimilarValues.size(); j++) {
                                if (vSimilarValues.get(j) < nMinSimilarity) {
                                    nMinSimilarity = vSimilarValues.get(j);
                                    nMinSimIndex = j;
                                }
                            }
                        } else { // If there are more, keep only the best
                            // If this is better than the smallest distance, then remove the smallest
                            if (nCosineDistance > nMinSimilarity) {
                                // Remove the lowest similarity value
                                vSimilarKeys.remove(nMinSimIndex);
                                vSimilarValues.remove(nMinSimIndex);
                                // Add this one
                                vSimilarKeys.add(kRepeater);
                                vSimilarValues.add(nCosineDistance);
                                // Refresh the index of lowest similarity value
                                nMinSimilarity = vSimilarValues.get(0);
                                nMinSimIndex = 0;
                                for (int j = 0; j < vSimilarValues.size(); j++) {
                                    if (vSimilarValues.get(j) < nMinSimilarity) {
                                        nMinSimilarity = vSimilarValues.get(j);
                                        nMinSimIndex = j;
                                    }
                                }
                            }
                        } // End loop for maintaining list of similar entries

                    }// End iteration through repeaters

            for (int i = 0; i < vSimilarValues.size(); i++) {
                    System.out.println(Thread.currentThread().getName() + kNonRepeater + "|" + vSimilarKeys.get(i) + "|" + vSimilarValues.get(i));
          }
       }
   }
}

Finally, If not Multithreading, is there any other approaches in java, to reduce time complexity.

using `BigDecimal` means the program must do calculation for each digit - more digits -> more calculations -> more time.... not much different as if you had to do the calculation by hand and papper — user85421, Jul 28 '18 at 06:30
@VishwanathGulabal what java version do you use and what java version you can use? — Alexander Anikin, Aug 02 '18 at 08:31
@Alexander, openjdk version "1.8.0_171" OpenJDK Runtime Environment (build 1.8.0_171-b10) OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode) — Vishwanath Gulabal, Aug 02 '18 at 08:51

score 1 · Answer 1 · answered Jul 28 '18 at 06:38

1

The computer works similarly to what you have to do by hand (It processes more digits/bits at a time but the problem is the same.

If you do addition, the time is proportional to the of the size of the number.

If you do multiplication or divisor it's proportional to the square of the size of the number.

For the computer the size is based on multiples of 32 or 64 significant bits depending on the implementation.

answered Jul 28 '18 at 06:38

Peter Lawrey

525,659
79
751
1,130

Can you look into My Updated Original Problem and give some approaches to overcome time complexity?. – Vishwanath Gulabal Aug 02 '18 at 08:54

score 1 · Answer 2 · answered Aug 02 '18 at 09:44

I'd say this task is suitable for parallel streams. Don't hesitate to take a look at this conception if you have time. Parallel streams seamlessly use multithreading at full speed.

The top-level processing will look like this:

data_1.entrySet()
      .parallelStream()
      .flatmap(nonRepeaterEntry -> processOne(nonRepeaterEntry.getKey(), nonRepeaterEntry.getValue(), data2))
      .forEach(System.out::println);

You should provide processOne function with prototype like this:

Stream<String> processOne(String nonRepeaterKey, String nonRepeaterBitSet, Map<String BitSet> data2);

It will return prepared string list with what you print now into file.

To make stream inside you can prepare List list first and then turn it into stream in return statement:

return list.stream();

Even though inner loop can be processed in streams, parallel streaming inside is discouraged - you already have enough parallelism.

For your questions:

1) If i want to store the results of each thread without overlapping into a file, How can i do that?.

Any logging framework (logback, log4j) can deal with it. Parallel streams can deal with it. Also you can store prepared lines into some queue/array and print them in separate thread. It takes a bit of care, though, ready solutions are easier and effectively they do such thing.

2) How can i make the following much faster.

Optimize and parallelize. At normal situation you get number_of_threads/1.5..number_of_threads times faster processing thinking you have hyperthreading in play, but it depends on things you do not-so-parallel and underlying implementations of stuff.

3) How to create the chunks dynamically, based on number of cores?

You don't have to. Make a list of tasks (1 task per data_1 entry) and feed executor service with them - that's already big enough task size. You can use FixedThreadPool with number of threads as parameter, and it will deal will distribute tasks evenly.

Not you should create task class, get Future for each task upon threadpool.submit and in the end run a loop doing .get for each Future. It will throttle main thread down to executor processing speed implicitly doing fork-join like behaviour.

4) Direct threads creation is outdated technique. It's recommended to use executor service of some sort, parallel streams etc. For loop processing you need to create list of chunks, and in loop create thread, add it to list of threads. And in another loop join to each thread if the list.

Ad hoc optimizations:

1) Make Repeater class that will store key, bitset and cardinality. Preprocess your hashsets turning them into Repeater instances and calculating cardinality once (i.e. not for every inner loop run). It will save you 20mil*(20mil-1) calls of .cardinality(). You still need to call it for difference.

2) Replace similarKeys, similarValues with limited size priorityQueue on combined entries. It works faster for 30 elements.

Take a look at this question for infor about PriorityQueue: Java PriorityQueue with fixed size

3) You can skip processing of nonRepeater if its cardinality is already 0 - bitSet and will never increase resulting cardinality, and you'll filter out all 0-distance values.

4) You can skip (remove from temporary list you create in p.1 optimization) every Repeater with zero cardinality. Like in p.3 it will never produce anything fruitful.

Mulithreading Usage

2 Answers2

Even though inner loop can be processed in streams, parallel streaming inside is discouraged - you already have enough parallelism.