Your if (N%2)
can be easily extended to any unroll factor:
for (; i < N-B+1; i += B) {
x1; x2; ... xB;
}
if (i < N) {
x1;
if (i < N-1) {
x2;
...
if (i < N-B+2) {
xB;
}}}
For small unroll factors this may be more efficient than second loop or Duff's device.
This version looks better. gcc 4.6 compiles almost the same code out of it:
if (i++ < N) {
x1;
if (i++ < N) {
x2;
...
if (i++ < N) {
xB;
}}}
And this version may be more optimal if B
is a power of two. At least gcc compiles better code for it. Also it is definitely the best if N
is a constant. But if neither N
is a constant, nor B
is a power of two, advantage of this method is not so obvious because of less efficient remainder computation (which means usually several instructions, including multiplication):
if (N%B > B-2) {
x1;
if (N%B > B-3) {
x2;
...
if (N%B > 0) {
xB;
}}}