Coefficient Correlation Over a Large Binary Image Data-Set - Slow Performance

Question

I am trying to build an OCR by calculating the Coefficient Correlation between characters extracted from an image with every character I have pre-stored in a database. My implementation is based on Java and pre-stored characters are loaded into an ArrayList upon the beginning of the application, i.e.

ArrayList<byte []> storedCharacters, extractedCharacters;
storedCharacters = load_all_characters_from_database();
extractedCharacters = extract_characters_from_image();

// Calculate the coefficent between every extracted character
// and every character in database.
double maxCorr = -1;
for(byte [] extractedCharacter : extractedCharacters)
  for(byte [] storedCharacter : storedCharactes)
  {
     corr = findCorrelation(extractedCharacter, storedCharacter)
     if (corr > maxCorr)
       maxCorr = corr;
  }
...
...
public double findCorrelation(byte [] extractedCharacter, byte [] storedCharacter)
{
  double mag1, mag2, corr = 0;
  for(int i=0; i < extractedCharacter.length; i++)
  {
     mag1 += extractedCharacter[i] * extractedCharacter[i];
     mag2 += storedCharacter[i] * storedCharacter[i];
     corr += extractedCharacter[i] * storedCharacter[i];
  } // for
  corr /= Math.sqrt(mag1*mag2);
  return corr;
}

The number of extractedCharacters are around 100-150 per image but the database has 15600 stored binary characters. Checking the coefficient correlation between every extracted character and every stored character has an impact on the performance as it needs around 15-20 seconds to complete for every image, with an Intel i5 CPU. Is there a way to improve the speed of this program, or suggesting another path of building this bringing similar results. (The results produced by comparing every character with such a large dataset is quite good).

Thank you in advance

UPDATE 1

public static void run() {
    ArrayList<byte []> storedCharacters, extractedCharacters;
    storedCharacters = load_all_characters_from_database();
    extractedCharacters = extract_characters_from_image();
    
    // Calculate the coefficent between every extracted character
    // and every character in database.
    computeNorms(charComps, extractedCharacters);       
    double maxCorr = -1;
    for(byte [] extractedCharacter : extractedCharacters)
      for(byte [] storedCharacter : storedCharactes)
      {
         corr = findCorrelation(extractedCharacter, storedCharacter)
         if (corr > maxCorr)
           maxCorr = corr;
      }
    }
}
private static double[] storedNorms;
private static double[] extractedNorms;
       
// Correlation  between to binary images
public static double findCorrelation(byte[] arr1, byte[] arr2, int strCharIndex, int extCharNo){
         final int dotProduct = dotProduct(arr1, arr2);
         final double corr = dotProduct * storedNorms[strCharIndex] * extractedNorms[extCharNo];
         return corr;
}
    
public static void computeNorms(ArrayList<byte[]> storedCharacters, ArrayList<byte[]> extractedCharacters) {
          storedNorms = computeInvNorms(storedCharacters);
          extractedNorms = computeInvNorms(extractedCharacters);
}
    
private static double[] computeInvNorms(List<byte []> a) {
         final double[] result = new double[a.size()];
         
         for (int i=0; i < result.length; ++i) 
            result[i] = 1 / Math.sqrt(dotProduct(a.get(i), a.get(i)));
         return result;
}
      
private static int dotProduct(byte[] arr1, byte[] arr2) {
         int dotProduct = 0; 
         for(int i = 0; i< arr1.length; i++)
            dotProduct += arr1[i] * arr2[i];
          
         return dotProduct;
}

Your question is quite vague. Anyway, a way to explore would be not trying to match first with "prototype" letters and then only compare with letters related to such prototype. For example, if a letter is very close to `X` and very different from `a`, then you only compare it with `x`, `X`, `Y` and not with `s`. Of course, finding the sets of letters and its best "prototype" is not a trivial task. — SJuan76, May 14 '14 at 07:07
How big are your arrays of bytes? You could prepare a "low resolution" model (smaller array) for preliminary test like SJuan76 suggests — Maxx, May 14 '14 at 07:11
And you could use more than one thread (I think i5 allows 4 threads). — assylias, May 14 '14 at 07:52
IF I understood correctly, the bottleneck is the loops with the `findCorrelation` call. One could give more focussed hints and suggestions if you provided the code of this method. Otherwise, one can only give general statements, like "high-level" optimizations (multithreading, as it was mentioned), or hints that refer to the approach itself (i.e. whether it can be optimized by some sort of hashing or heuristics) - but the latter is difficult as long as you don't describe the approach in more detail. — Marco13, May 14 '14 at 08:22
The snippet you're showing is the part not to optimize. **Improvement may be possible in what you're hiding** (i.e., `findCorrelation`). At this level, you could only parallelize (which may be worth it, but the speed-up is limited by the number of cores). — maaartinus, May 14 '14 at 08:44
Well the findCorrelation() method is just a loop that goes through the pixels of extractedCharacter and compares with pixels of the characters in the database. Each character has a size of 12x16, so yes i agree that creating smaller letter will beneficial but it can possibly have an impact on results. Using mutithreading is also not the answer i am seeking for cause i would like the application to have very close timings no matter the CPU architecture. Maybe the use of neural networks or by extracting characteristics of character before starting the correlation procedure would be a perform aid — javasuns, May 14 '14 at 12:18
Also the database contains character sets of every font so more fonts maybe added in the future. This will have an impact to the performance again — javasuns, May 14 '14 at 12:20
There's a single loop (`i`) for a 2-D array and `j` isn't defined. So I'm unsure what's going on. Anyway, it can't compile. In case you're really processing a 2-D array, convert it to 1-D and enjoy the speed-up. — maaartinus, May 14 '14 at 20:13
@maaartinus, Well I am not sure if there is much difference between 2D arrays and 1D arrays with Java HotSpot 1.7. — javasuns, May 15 '14 at 06:13
I'd bet there's a difference as your data doesn't fit in L2 and the indirection means a cache miss (and prefetching improbably helps). Moreover, the array bounds check is mandatory (it gets hoisted out of the loop, but with 10 or 15 iterations only it's still measurable). — maaartinus, May 15 '14 at 08:04
I wonder what's different as even your original algorithm takes only 1.3 second on my i5-2400 CPU @ 3.10GHz. — maaartinus, May 15 '14 at 13:38
@maaartinus There are two differences in the original algorithm than the one I provided here. a) The characters extracted from image and the ones stored are in a 2D array. I am using a 2D array because it is easier to manipulate with x,y. b) For every extracted character I am also running a normalization to bring the character to the same size as the ones loaded from the database, which I made a mistake by saying their size is 12x16. They are 16x25 — javasuns, May 16 '14 at 19:05
OK. So your chars are twice as big, yet your version on my computer runs 10 times faster. A factor of 5 remains... and I'd attribute a big share of it to your 2D arrays! I'd say, 2D arrays are easier to manipulate only if you miss to write a maybe 15 lines long helper class. — maaartinus, May 16 '14 at 19:11
@maaartinus Well, for example image with all characters have a 640x480 size. Putting it in a 2D array can get the pixel directly with x,y, whilst having it in a 1D array, I would need height and do some extract calculations to get the pixel. 2D Array => image[x][y] 1D Array => image[x * height + y] Do you think that "x *height + y" will have a better performance than [x][y]? — javasuns, May 17 '14 at 07:17
**1.** What's the most time-consuming part of your application? I guess it's the one you're asking about in this question. **2.** Does it need `x` and `y`? No, it's just runs through all the pixels **3.** So for speed a 1D array is better. **4.** Even computations which need the 2D access can profit from the 1D layout. Multiplication is 3 cycles only, an L1 cache-miss is maybe 10. In a loop the multiplication gets optimized away. — maaartinus, May 17 '14 at 08:01

score 0 · Accepted Answer · edited Jun 20 '20 at 09:12

0

Nowadays, it's hard to find a CPU with a single core (even in mobiles). As the tasks are nicely separated, you can do it with a few lines only. So I'd go for it, though the gain is limited.

In case you really mean cross-correlation, then a transform like DFT or DCT could help. They surely do for big images, but with yours 12x16, I'm not sure.

Maybe you mean just a dot product? And maybe you should tell us?

Note that you actually don't need to compute the correlation, most of the time you only need is find out if it's bigger than a threshold:

corr = findCorrelation(extractedCharacter, storedCharacter)
..... more code to check if this is the best match ......

This may lead to some optimizations or not, depending on how the images look like.

Note also that a simple low level optimization can give you nearly a factor of 4 as in this question of mine. Maybe you really should tell us what you're doing?

UPDATE 1

I guess that due to the computation of three products in the loop, there's enough instruction level parallelism, so a manual loop unrolling like in my above question is not necessary.

However, I see that those three products get computed some 100 * 15600 times, while only one of them depends on both extractedCharacter and storedCharacter. So you can compute

100 + 15600 + 100 * 15600

dot products instead of

 3 * 100 * 15600

This way you may get a factor of three pretty easily.

Or not. After this step there's a single sum computed in the relevant step and the problem linked above applies. And so does its solution (unrolling manually).

Factor 5.2

results

While byte[] is nicely compact, the computation involves extending them to ints, which costs some time as my benchmark shows. Converting the byte[]s to int[]s before all the correlations gets computed saves time. Even better is to make use of the fact that this conversion for storedCharacters can be done beforehand.

Manual loop unrolling twice helps but unrolling more doesn't.

edited Jun 20 '20 at 09:12

Community

1
1

answered May 14 '14 at 14:40

maaartinus

44,714
32
161
320

I have included in the original post the code for findCorrelation(). The procedure returns the correlation coefficient between two images A and B, where A and B are vectors of the same size holding 0 and 1 values. The corr returned is a value between 0.0 to 1.0 showing how close the two images are with 1 mean that they are identical – javasuns May 14 '14 at 19:30
@javasuns: I see. That's what I called `dot product` above (I'm not saying "correlation" is wrong, it's just I understood it differently). Forget the transforms mentioned above. I'm gonna edit my answer soon. – maaartinus May 14 '14 at 19:59
Well I can see what you are saying, and this can probably improve the performance. I'll post my updated results – javasuns May 15 '14 at 06:19
Well, I have tried to change byte to int but I have not seen any real difference in performance. What version of Java do you use? – javasuns May 16 '14 at 19:02
Changing `byte`s to `int` alone probably doesn't help much; I guess you need the other (and simpler) optimization first. `java -version` `openjdk version "1.9.0-internal"`, but this doesn't matter (I've been running a lot of benchmarks and they hardly changed when I temporarily switched from 1.7 to this experimental setup). – maaartinus May 16 '14 at 19:16
What's the other and simpler optimization?:) – javasuns May 17 '14 at 07:18
@javasuns See `timeMine1` in my linked benchmark. Btw., with switching to 16x25 the times simply [doubled as expected](https://microbenchmarks.appspot.com/runs/352e85cd-78e5-42ca-b831-ee1b16410625) and the ratio stayed the same. Switching back to java version "1.7.0_55" did nothing. – maaartinus May 17 '14 at 07:56
Thank you for all your help. Changing code based on your guidance, processing characters in 2D arrays is completed in 2200ms which is 1/3 faster than before. I have not changed them to array of integers, just left them as arrays of bytes. Is that the results you get? Also I have noticed that if I change the code where the dot product is calculated between the stored Character and the extracted Character (in the nested loops) with "final int dotProduct = (int) Math.random() * 1000;" the procedure finishes in less than 200ms. Thus, that line is the one that takes all the process. – javasuns May 17 '14 at 13:49
What do you mean by "1/3 faster than before"? Something like 66%? Too lame. My first step lead to a speedup factor of nearly 3 (cf. `timeMine1`). Note that there's just a trivial change there (saving two out of three dot product computations). Look at the code or get [caliper](http://code.google.com/p/caliper) and run it. The change to `int[]` comes later for some more speedup, so it's OK to ignore it till you succeed with the first part. And sure, it's the dot product which eats all the time. – maaartinus May 17 '14 at 14:33
Do you think there is any other way to boost performance, by avoiding do all those calculations? I need to bring the timing even more down, cause adding mroe characters will surely impact the performance later on. Again thank you for all your help – javasuns May 18 '14 at 07:30
Converting 2D array images to 1D array before procedure lowered down the timing two 1100ms, a 50% boost. However, changing arrays from byte to int does not give any performance increase in my situation, but almost always give a worse timing by being a ~100ms slower – javasuns May 18 '14 at 08:39
@javasuns I can't see any way how to speed up something like `timeMine2` in Java (in assembly you could use SIMD instructions). There may be some high level optimization, I don't know. You should post some code like I did, 'cause now nobody knows how it looks like. – maaartinus May 18 '14 at 16:33

Coefficient Correlation Over a Large Binary Image Data-Set - Slow Performance

UPDATE 1

1 Answers1

UPDATE 1

Factor 5.2