If speed is critical, according to this answer about branch prediction and this one, loop unrolling may be of help, avoiding the test induced by the for instruction, reducing the number of tests and improving "branch prediction".
The gain (or none, some compilers do that optimization for you) varies based on architecture / compiler.
On my machine, changing the loop while preserving the number of operations from
for(int i = 0; i < 500000000; i++){
residues[i % 100] = largeNumber % modules[i % 100];
}
to
for(int i = 0; i < 500000000; i+=5){
residues[(i+0) % 100] = largeNumber % modules[(i+0) % 100];
residues[(i+1) % 100] = largeNumber % modules[(i+1) % 100];
residues[(i+2) % 100] = largeNumber % modules[(i+2) % 100];
residues[(i+3) % 100] = largeNumber % modules[(i+3) % 100];
residues[(i+4) % 100] = largeNumber % modules[(i+4) % 100];
}
with gcc -O2
the gain is ~15%. (500000000 instead of 100 to observe a more significant time difference)