I am currently performing Fourier transforms for some physics problem, and a huge bottleneck of my algorithm comes from the evaluation of a scalar product modulo 2.
For a given integer N, I have to represent all the numbers in binary up to 2^N-1.
For each of these numbers, represented as a binary vector (e.g. 15 = 2^3 + 2^2 +2+2^0 = (1,1,1,1,0,...,0)) I have to evaluate its scalar products with all numbers from 0 to 2^N-1 in binary form modulo 2.
(for example, the scalar product 1.15 =(1,0,0,...,0).(1,1,1,1,0,...,0)=1*1+1*0+...=1 mod 2)
Note that the components are kept in binary form during the reducing modulo 2
(1,1).(1,1)=1*1+1*1 and not 1*1+2*2
This is basically 2^(2N) scalar products that I have to perform and reduce modulo 2.
I am having difficulty to get more than N = 18.
I was wondering whether some clever mathematical trick can be used to greatly reduce the time spent doing them.
I was thinking of some kind of recursion (i.e. saving results for N in a file and deduce the results for N+1) but I am not sure this would help. Indeed, with this recursion, knowing the results for N, I could cut the vector for N+1 corresponding to the N part plus an additional digit, but then at each scalar product, instead of evaluating the scalar product, I would have to tell my computer to go and read a big file (because I probably wouldn't be able to keep it all in dynamic memory), which is probably time-consuming, perhaps more than the ~20 multiplications I have to perform for each of the products.
Is there any known optimized number-theoretical algorithm allowing the evaluation of such a scalar product modulo 2 very quickly ? Are there any rules or ideas I am not aware of that I could exploit ?
Sorry for the terrible formatting, I just can't get LateX to work in here.