I have a 64-bit Ubuntu 13.04 system. I was curious to see how 32-bit applications perform against 64-bit applications on a 64-bit system so I compiled the following C program as 32-bit and 64-bit executable and recorded the time they took to execute. I used gcc flags to compile for 3 different architectures:
-m32
: Intel 80386 architecture (int, long, pointer all set to 32 bits (ILP32))-m64
: AMD's x86-64 architecture (int 32 bits; long, pointer 64 bits (LP64))-mx32
: AMD's x86-64 architecture (int, long, pointer all set to 32 bits (ILP32), but CPU in long mode with sixteen 64b registers, and register call ABI)
// this program solves the
// project euler problem 16.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <assert.h>
#include <sys/time.h>
int sumdigit(int a, int b);
int main(void) {
int a = 2;
int b = 10000;
struct timeval start, finish;
unsigned int i;
gettimeofday(&start, NULL);
for(i = 0; i < 1000; i++)
(void)sumdigit(a, b);
gettimeofday(&finish, NULL);
printf("Did %u calls in %.4g seconds\n",
i,
finish.tv_sec - start.tv_sec + 1E-6 * (finish.tv_usec - start.tv_usec));
return 0;
}
int sumdigit(int a, int b) {
// numlen = number of digit in a^b
// pcount = power of 'a' after ith iteration
// dcount = number of digit in a^(pcount)
int numlen = (int) (b * log10(a)) + 1;
char *arr = calloc(numlen, sizeof *arr);
int pcount = 0;
int dcount = 1;
arr[numlen - 1] = 1;
int i, sum, carry;
while(pcount < b) {
pcount += 1;
sum = 0;
carry = 0;
for(i = numlen - 1; i >= numlen - dcount; --i) {
sum = arr[i] * a + carry;
carry = sum / 10;
arr[i] = sum % 10;
}
while(carry > 0) {
dcount += 1;
sum = arr[numlen - dcount] + carry;
carry = sum / 10;
arr[numlen - dcount] = sum % 10;
}
}
int result = 0;
for(i = numlen - dcount; i < numlen; ++i)
result += arr[i];
free(arr);
return result;
}
The commands I used to get different executable:
gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_x32 -lm -mx32
gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_32 -lm -m32
gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_64 -lm
Here are the results I got:
ajay@ajay:c$ ./pe16_x32
Did 1000 calls in 89.19 seconds
ajay@ajay:c$ ./pe16_32
Did 1000 calls in 88.82 seconds
ajay@ajay:c$ ./pe16_64
Did 1000 calls in 92.05 seconds
Why does the 64-bit version runs slower than the 32-bit one? I read that the 64-bit architecture has improved instruction set and twice more general purpose registers compared to the 32-bit architecture which allows for more optimizations. When can I expect a better performance on a 64-bit system?
Edit
I turned on the optimization using -O3
flag and now the results are:
ajay@ajay:c$ ./pe16_x32
Did 1000 calls in 38.07 seconds
ajay@ajay:c$ ./pe16_32
Did 1000 calls in 38.32 seconds
ajay@ajay:c$ ./pe16_64
Did 1000 calls in 38.27 seconds