1

I am currently using a GCC 3.3.3 based cross compiler to compile for a Xscale PXA270 development board. However, I was wondering if there are other Xscale compilers out there that run on Linux (or Windows for that matter)? The cross compiler setup I am using has horrendous performance on the target device, with certain programs that do a decent amount of math operations performing 10 to 20 times worse on the Xscale processor than on a similarly clocked Pentium 2. Any other options for compilers out there or specific compiler flags I should be setting with my GCC-based compiler that may help with the performance?

Thanks, Ben

user21293
  • 6,439
  • 11
  • 44
  • 57
  • 3
    GCC 3.3!? You realize that's more than 5 years old? Be a 'real programmer' and compile your own GCC 4.4.2 tool chain! – LiraNuna Jan 21 '10 at 04:51
  • I have compiled a 4.1 GCC tool chain for it, but it seems awfully hit or miss, so I've gone back to the one provided me by the vendor. – user21293 Jan 21 '10 at 13:49
  • Dunno about the "real programmer" stuff, but if you measure recent GCCs' operation you'll find that, for ARM at least, 4.2 was a local minimum in time taken to compile, memory used to compile, size of resulting object code and time the object code takes to run. From 4.3 onwards some kind of exponential growth in all 4 sets in. – martinwguy Jan 22 '10 at 11:27

3 Answers3

5

Unlike the Pentium 2, the XScale architecture doesn't have native floating point instructions. That means floating point math has to be emulated using integer instructions - a 10 to 20 times slowdown sounds about right.

To improve performance, you can try a few things:

  • Where possible, minimise the use of floating point - in some places, you may be able to subsitute plain integer or fixed point calculations;
  • Trade-off memory for speed, by precalculating tables of values where possible;
  • Use floats instead of doubles in calculations where you do not need the precision of the latter (including using the C99 float versions of math.h functions);
  • Minimise conversions between integers and floating point types.
caf
  • 233,326
  • 40
  • 323
  • 462
4

Yes, you don't have an FPU so floating point needs to be done in integer math. However, there are two mechanisms for doing this, and one is 11 times faster than the other.

GCC target arm-linux-gnu normally includes real floating point instructions in the code for ARM's first FPU, the "FPA", now so rare it is nonexistent. These cause illegal instruction traps which are then caught and emulated in the kernel. This is extremely slow due to the context switch.

-msoft-float instead inserts calls to library functions (in libgcc.a). This avoids the switch into kernel space and is 11 times faster that the emulated FPA instructions.

You don't say what floating point model you are using - it may be that you are already building the whole userland with -msoft-float - but it might be worth checking that your object files contain no FPA instructions. You can check with:

objdump -d file | grep '<space><tab>f' | less
where file is any object file, executable or library that your compiler outputs. All FPA instructions start with f, while no other ARM instructions do. Those are actual space and tab characters there, and you might need to say <control-V><tab> to get the tab character past your shell.

If it is using FPA insns, you need to compile your entire userland using -msoft-float.

The most comprehensive further reading on these issues is http://wiki.debian.org/ArmEabiPort which is primarily concerned with a third alternative: using an arm-linux-gnueabi compiler, a newer alternative ABI that is available from gcc-4.1.1 onwards and which has different characteristics. See the document for further details.

martinwguy
  • 948
  • 6
  • 14
2

"Other xscale compilers"

Open source: llvm and pcc, of which llvm is the most linux-friendly and functional, and also has a gcc front-end; pcc, a descendant of the venerable Portable C Compiler, seems more bsd-oriented.

Commercial: The Keil compiler (owned by ARM Ltd) seems to produce faster code than GCC, but is not going to impact your lack of an FPU significantly.

martinwguy
  • 948
  • 6
  • 14