Karatsuba multiplication optimization on byte array

Question

I have implemented the multiplication for two byte array and it works fine. More precisely, I need to multiply a 64 bytes operand with a 32 bytes one.

I implemented it the easiest way : I make a double loop and I compute the product for each byte in each array. So for the specific values, it takes 64 * 32 = 2048 steps.

I tried to optimize it with the Karatsuba method. So I proceed the following way :

a has a length of 64 bytes and b has a length of 32 bytes.

We split a in : a = p * 16^(32) + q (so p and q have both a length of 32 bytes) and compute : a * b = p * b * 16^(32) + q * b (products are compute with my function described before).

I get the right result but it takes the same time to compute : 2 multiplications of two 32 bytes array : 32 * 32 * 2 = 64 * 32 = 2048.

My question is the following : to optimize my multiplication using Karatsuba, should I program it totally recursively ? In other way it will never be faster ?

Thank you in advance :)

Why don't you use `BigInteger` for representing large numbers? — isnot2bad, May 04 '15 at 07:56
Well, I programm it in Java to make tests on my computer but the aim is to switch it in Java Card — Raoul722, May 04 '15 at 07:57
You should normally program it recursively until you get sufficiently small subproblems that solving them iteratively would be faster than solving them recursively (i.e. for Karatsuba the cost of the extra additions would be more than the benefit from one fewer multiplication). For Karatsuba multiplication the cutoff point should probably be three or four digits but it may vary depending on your implementation (For an example of this applied to a different problem look at Timsort). — Hadi Khan, May 04 '15 at 08:03
Very often, other people can't tell you how the optimal solution for your problem is. Instead, you should make yourself familiar with the profiling tools which exist for Java ... and then carefully measure what is going on. Of course, a solution that requires recursion might be more costly than one that avoids additional method calls. And just-in-time compiler can play a role here, too. Long story short: measure, don't assume. And: unless when dealing with "realtime" performance requirements ... **readability** of code is often more important than having a 100% performance max solution. — GhostCat, May 04 '15 at 08:04
look here http://stackoverflow.com/q/18465326/2521214 for some ideas — Spektre, May 04 '15 at 08:47

score 3 · Answer 1 · answered May 04 '15 at 09:28

Wow! One of my first ever jobs as a programmer was optimizing a multiplication algorithm for a COBOL runtime system - that was 31 years ago.

The technique used, which I think you will find effective, is to combine the bytes into larger units. At the time, only 32 bit was available, so two bytes were merged into a short and the shorts multiplied to give a 32 bit int. But in Java you have 64 bits available, so you can multiply two ints to get a long.

So, you should make a an array a1 of 16 ints and b an array b1 8 ints by adding the bytes. E.g. sometime like this:

a1[0] = (a[0] << 24) + (a[1] << 16) + (a[2] << 8) + a[3]

Or you can write a loop to do this more concisely.

Then multiply a1 and b1, which should take 128 operations.

I would be concerned about overflow and signed vs. unsigned values. The digits after the highest digit should be unsigned, but Java does not have an unsigned modifier. However, in Java 8 there is some support for unsigned operations: see Primitive Data Types.

If you can't get the ints/longs to work unsigned, you can always combine groups of 2 or 3 bytes instead into ints and waste some of the top bits to give you space for the sign bit.

I am afraid that the problem concerns a JavaCard implementations, specifically using the Karatsuba algorithm on bytes. — Joop Eggen, May 04 '15 at 09:34
OK - thanks. I will leave my answer in case it is of interest. It may be of use to someone wanting to optimize byte array arithmetic. — rghome, May 04 '15 at 09:39
Well, in fact I want to adapt it to a JavaCard implementation so it doesn't match for it. Anyway, this is a nice way to proceed, thank for the tip ! :) — Raoul722, May 04 '15 at 09:56

score 2 · Accepted Answer · answered May 04 '15 at 10:38

Yes, the Karatsuba algorithm is only effective if you're doing it recursively. But remember: Karatsuba is not always faster than the simple algorithm, that takes O(n^2) (usually we assume that both numbers has the same length, if we're going to multiply large numbers). For small inputs (it can be 1, it can be 15, depending on your CPU) the simple algorithm can be faster, so the optimal use of Karatsuba is:

if size > MIN_SIZE_FOR_KARATSUBA (you have to find it experimentally), then do the split and recursively call the Karatsuba.
If size <= MIN_SIZE_FOR_KARATSUBA, then just multiply them with simple algorithm.

And also, don't split your array into bytes in multiplication, store them in integers in base 1000 or something like that, because you can overflow your type easily.

The best implementation of Karatsuba's algorithm is described in this link. Usually Karatsuba takes O(n log n) memory, but here with some tricks it takes only O(n) memory.

If you don't want to use function calls many times (because function calls is the slowest operation in programming), then you can use loops and implement a stack by yourself, as in my implementation.

Karatsuba multiplication optimization on byte array

2 Answers2