Is there any way to speed up this computation of the cosine similarity between two double arrays in java?

Question

I have two double arrays a and b and want to calculate the cosine similarity between them. My code looks like this:

double [][] target = new double [1][65000];
double [][] compare = new double [1][65000];

double dotProduct = dot(target[0], compare[0]);
double eucledianDist = norm2(target) * norm2(compare);
double output = dotProduct / eucledianDist;

private double norm2(double[][] a){
    double sum = 0;
    for (int i = 0; i < a[0].length; i++){
        sum = sum + a[0][i] * a[0][i];
    }
    return Math.sqrt(sum);
}

private double dot(double[] a, double [] b){
    double sum = 0;
    for(int i = 0; i < a.length; i ++){
        sum += a[i] * b[i];
    }
    return sum;
}

Is there any way to speed up computation time?

Just wondering: what is the point of declaring a [][], when the first dimension has size 1? Why not go with a simple double[] then? (not that this should make a big difference performancewise) — GhostCat, Oct 08 '18 at 12:06
Maybe punt this calculation to C via Java Native Interface. I wouldn't be surprised to find there is already a JNI wrapper for BLAS or something like that. — Robert Dodier, Oct 08 '18 at 23:25

jbx · Accepted Answer · 2018-10-08T12:29:46.257

6

I presume your worry is for when you have large arrays and you want to avoid looping through them twice. As pointed out elsewhere, the first dimension seems to be redundant in your functions, so in the answer below I avoided it.

What you could do is try to combine both loops together in one function.

Something like:

double computeSimilarity(double[] a, double[] b) {
  //todo: you might want to check they are the same size before proceeding

  double dotProduct = 0;
  double normASum = 0; 
  double normBSum = 0;

  for(int i = 0; i < a.length; i ++) {
      dotProduct += a[i] * b[i];
      normASum += a[i] * a[i];
      normBSum += b[i] * b[i];
  }

  double eucledianDist = Math.sqrt(normASum) * Math.sqrt(normBSum);
  return dotProduct / eucledianDist;
}

If you really need 2 dimensions, call the function above on each dimension. So in your example you would call it like computeSimilarity(target[0], compare[0]);

edited Oct 08 '18 at 12:29

answered Oct 08 '18 at 12:15

jbx

21,365
18
90
144

1

Valid point, I kinda overlooked this simple aspect in my answer ;-) – GhostCat Oct 08 '18 at 12:16
You forgot the `Math.sqrt` - or did you drop it intentionally? – Hulk Oct 08 '18 at 12:19
Yep copy and paste bugs, fixed. – jbx Oct 08 '18 at 12:22
with Java 9+ you can use the [fma](https://docs.oracle.com/javase/10/docs/api/java/lang/Math.html#fma(double,double,double)) to have fewer rounding errors – bobah Oct 08 '18 at 12:22
@bobah Interesting. Can you post an answer how it would look? Would be good to have it as an alternative. – jbx Oct 08 '18 at 12:24
2

Should your sqrt be on the distance not the dot? Though the point of your post sounds good. – matt Oct 08 '18 at 12:26
It's really not worth an answer, just `dotProduct += a[i] * b[i]` would be `dotProduct = fma(a[i], b[i], dotProduct)` and if the CPU supports the fma it will be more precise and not slow. – bobah Oct 08 '18 at 12:26
This answer reduces my computation time by half, great answer! – Noltibus Oct 08 '18 at 12:29
@Noltibus Great. Actually I think it reduces by more than half, since before you had the loop 3 times, once for dot, and once for each of the 2 norm calculations. – jbx Oct 08 '18 at 12:31

GhostCat · Answer 2 · 2018-10-08T12:29:22.903

All code in here is pretty straight forward. In addition, the methods are also pretty short. ( and yes, the other answer is correct: the first thing to do is to reduce the number total passes over your arrays )

From there, you can look into two things:

making sure that the JIT kicks in early and fully inlines your methods, and turns them into machine code (one can configure for example how many loop iterations are required to trigger inlining, and how long methods can be to still get inlined)
your loop iterations are all independent. So instead of computing all iterations in sequence, you could fire up multiple threads, and each thread works parts of that sequence. Depending on the underlying hardware, that puts a higher load on your system, but also gets you results quicker.

Both approaches require some "digging" into the corresponding topics, but doing so could result in quite some gains. Which solution gives you better results really depends on context, thus it is worth following up both strategies.

So basically you have to ensure you can properly measure execution time ( see here), to then make experiments to understand what changes buys you the most given your setup.

score 2 · Answer 3 · answered Oct 08 '18 at 12:29

2

For good order the Stream version, as more expressive and parallelisable.

double computeSimilarity(final double[] a, final double[] b) {
    double normA = Math.sqrt(DoubleStream.of(a).parallel().map(x -> x * x).sum());
    double normB = Math.sqrt(DoubleStream.of(b).parallel().map(x -> x * x).sum());
    double dotProduct = IntStream.range(0, a.length).parallel()
            .mapToDouble(i -> a[i] * b[i]).sum();

    double eucledianDist = normA * normB;
    return dotProduct / eucledianDist;
}

answered Oct 08 '18 at 12:29

Joop Eggen

107,315
7
83
138

1

Might be interesting to benchmark this against a single pass loop. Another interesting alternative to explore would be to zip the two streams into a stream of pairs from the values of the two arrays and process that in one pass (which can be parallelized too). – jbx Oct 08 '18 at 12:35
@jbx I got one down-vote; maybe from someone who tried and did not get a speed improvement. But I would like to see some actual benchmark figures too. For small arrays this will be slower, but huge arrays? Maybe the down-vote was for uninformative text, but an answer was already accepted. – Joop Eggen Oct 08 '18 at 15:01
No idea, looks like a valid solution to me (assuming the math is correct). Now you have +1 from my side anyway. I think with large arrays it would be worth investigating parallel processing, although I would try to keep it a one pass process, which should be possible with a bit more work. – jbx Oct 08 '18 at 15:20

score 0 · Answer 4 · answered Oct 08 '18 at 14:18

A classical micro-optimization is loop unrolling: duplicate the loop body to avoid the exit test.

double computeSimilarity(double[] a, double[] b) {

  double dotProduct = 0;
  double normASum = 0; 
  double normBSum = 0;

  for(int i = 0; i + 3 < a.length; i++) {
      dotProduct += a[i] * b[i];
      normASum += a[i] * a[i];
      normBSum += b[i] * b[i];
      i++;
      dotProduct += a[i] * b[i];
      normASum += a[i] * a[i];
      normBSum += b[i] * b[i];
      i++;
      dotProduct += a[i] * b[i];
      normASum += a[i] * a[i];
      normBSum += b[i] * b[i];
      i++;
      dotProduct += a[i] * b[i];
      normASum += a[i] * a[i];
      normBSum += b[i] * b[i];
  }
  for( ; i < a.length; i ++) {
      dotProduct += a[i] * b[i];
      normASum += a[i] * a[i];
      normBSum += b[i] * b[i];
  }

  double eucledianDist = Math.sqrt(normASum) * Math.sqrt(normBSum);
  return dotProduct / eucledianDist;
}

Maybe storing a[i] and b[i] in temporary variables can have some small effect.

Is there any way to speed up this computation of the cosine similarity between two double arrays in java?

4 Answers4