I'm developing an encryption algorithm on the GPU. This algorithm requires the addition and multiplication of very very large integers . These numbers have a bit length of an estimated 150,000 bit or more.These numbers have different bit length. What algorithms can be used to perform addition and multiplication of these numbers? Please give me your information. Thank you.
-
1Check out https://gmplib.org/ . But perhaps more importantly, try to find an implementation of your desired encryption algorithm that's already well tested. Implementing your own will likely result in security problems. – John Zwinck Oct 21 '14 at 07:01
-
I've used the instructions of MPIR lib on CPU and there was no problem but GPU does not allow to use these instructions. I use VS2010 and CUDA 5.5. Is there any difference between these two library's instruction at runtime on GPU? – maral Oct 21 '14 at 07:56
-
Maybe you want this, then: http://www.hpcs.cs.tsukuba.ac.jp/~nakayama/cump/ – John Zwinck Oct 21 '14 at 08:35
-
CUMP operates on floating point numbers but I want a library that operates on integers. thank you very much for you help. – maral Oct 21 '14 at 09:57
-
For additions, the post [large integer addition with CUDA](http://stackoverflow.com/questions/12957116/large-integer-addition-with-cuda) may be of help. – Vitality Oct 21 '14 at 10:23
1 Answers
Large integer addition is relatively simple: JackOLantern already provided the link to the post. Basically it's just doing carry propagation via parallel prefix sum.
For large-integer multiplication on CUDA, there are two ways come to my mind:
convert the integers to RNS (Residue Number System): then multiplication and addition can be done in parallel (as long as RNS base is large enough). Whenever you need to compare the numbers you can convert them to mixed radix system (see, e.g., How to Convert from a Residual Number System to a Mixed Radix System?). Finally, you can use CRT (Chinese Remaindering) to convert the numbers back to positional number system
implement large-integer multiplication directly using FFT since multiplication can be viewed as acyclic convolution of sequences (150Kbits length is not that much for FFT but can already give you some speedup). Still GNU MP switches to FFT multiplication routines starting from 1Mbit or even more. Again for multiplication via FFT there are two options:
use floating-point double-precision FFT and encode large-integer bits into mantissa (easier to implement)
use the so-called Number-Theoretic transform (FFT over finite field)
Anyway, there is a bunch of theory behind these things. You can also check my paper on FFT mul in CUDA. But there are also many research papers on this subject especially in cryptography field.

- 1
- 1