My environment:
- Xilinx Zynq (based on ARM Cortex A9)
- PetaLinux V2014.2
I am developing a Linux application on Zynq using PetaLinux.
My current question is the processing time for four arithmetic operations (+/-/*/div).
I timed the processing time with clock_gettime()
using following codes.
For addition(+):
static void funcToBeTimed_floatAdd(void)
{
int idx;
float fval = 0.0;
for(idx=0; idx<100; idx++) {
fval = fval + 3.14;
}
}
For division(/):
static void funcToBeTimed_floatDiv(void)
{
int idx;
float fval = 314159000.00;
for(idx=0; idx<100; idx++) {
fval = fval / 1.001;
}
}
For time measurement, following codes are used.
The procNo
is set using main(int argc, char *argv[])
static void disp_elapsed(int procNo)
{
struct timespec tp1, tp2;
long dsec, dnsec;
/***/
switch(procNo) {
case 0:
printf("add\n");
clock_gettime(CLOCK_REALTIME, &tp1);
funcToBeTimed_floatAdd();
clock_gettime(CLOCK_REALTIME, &tp2);
break;
case 1:
printf("multi\n");
clock_gettime(CLOCK_REALTIME, &tp1);
funcToBeTimed_floatMulti();
clock_gettime(CLOCK_REALTIME, &tp2);
break;
default:
printf("div\n");
clock_gettime(CLOCK_REALTIME, &tp1);
funcToBeTimed_floatDiv();
clock_gettime(CLOCK_REALTIME, &tp2);
break;
}
dsec = tp2.tv_sec - tp1.tv_sec;
dnsec = tp2.tv_nsec - tp1.tv_nsec;
if (dnsec < 0) {
dsec--;
dnsec += 1000000000L;
}
printf("Epalsed (nsec) = %ld\n", dnsec);
}
As a result, the processing time for addition(+) and for division(/) were both around 2500 nsec.
Generally, the division is more costly than addition, I think, but not much difference in this case.
I would like to know
- What kind of optimization is applied to ARM
- Keywords to search further information on this kind of optimization
- (If any) some mistakes in the codes to check processing time (e.g. to avoid auto-optimization inside loop etc)