3

I have a C project which was previously being built with Codesourcery's gnu tool chain. Recently it was converted to use Realview's armcc compiler but the performance that we are getting with Realview tools is very poor compared to when it is compiled with gnu tools. Shouldnt it be opposite case i.e it should give better performance when compiled with Realview's tools? What am I missing here. How can I improve the performance with Realview's tools?

Also I have noticed that if I run the binary produced by Realview Tools with Lauterbach it crashes but If I run it using Realview ICE it runs fine.

UPDATE 1

Realview Command line:

armcc -c --diag_style=ide --depend_format=unix_escaped --no_depend_system_headers --no_unaligned_access --c99 --arm_only --debug --gnu --cpu=ARM1136J-S --fpu=SoftVFP --apcs=/nointerwork -O3 -Otime

GNU GCC command line:

arm-none-eabi-gcc -mcpu=arm1136jf-s -mlittle-endian -msoft-float -O3 -Wall

I am using Realview Tools version 4.1 and GCC version 4.4.1

UPDATE 2

Lauterbach issue has been solved. It was being caused because of Semihosting as the semihosting SWI was not being handled in Lauterbach environment. Retargeting the C library to avoid Semihosting did the trick and now my program runs successfully with Lauterbach as well as Realview ICE. But the performance issue is as it is.

binW
  • 13,220
  • 11
  • 56
  • 69
  • 1
    What's the target? What are the versions of RealView and Code Sourcery involved? Do you have any profiling information about a specific area that seems to have a large performance difference> – Michael Burr Apr 22 '11 at 16:23
  • @Micahel-Burr: Target is iMX31 i.e 1136JF-S processor. Realview Tools version 4.1 and GCC version 4.4.1. No profiling information about specific area is available. – binW Apr 25 '11 at 07:11
  • Are you using floating point? And if so have you correctly configured the compiler options to use floating point hardware. Also if you are using sqrt() it is possible that the library is using a software convergence rather than the hardware SQRT instruction. You need to correctly configure the compiler/linker options to avoid this. For example see this: http://www.keil.com/support/docs/3293.htm. You need to at least post the options you are using for each compiler and if possible some example code that demonstrates the different performance. – Clifford Apr 25 '11 at 07:31
  • @Clifford: I am not using FPU hardware so all floating point operations are software based in both builds i.e GNU and Realview. – binW Apr 25 '11 at 10:24
  • @binW: Interesting choice for a target that has an FPU! The VFP can give a 5x acceleration to FP operations without vectorization, and 10x if your compiler performs vectorisation or using a vectorisation optimised library. – Clifford Apr 25 '11 at 10:33
  • Are you certain that the compiler has in fact used s/w FP in both cases? i.e. have you checked the generated assembler code? If VFP emulation rather than FP library calls is selected; faced with a real VFP, it will use it rather than software because the invalid op-code exception that triggers the emulation will not occur. As I said we do *need* to know the compiler/linker options you are applying in each case. Add that information to your question, and you may start getting answers rather than comments. – Clifford Apr 25 '11 at 10:33
  • @Clifford: I have added command lines for both compilers. – binW Apr 25 '11 at 11:09
  • @Micheal Burr asked for compiler versions in order to help assist you, not out of idle curiosity. You could help yourself by not ignoring such requests. – Clifford Apr 25 '11 at 17:48
  • @Clifford: I have mentioned the compiler versions in my comment following Micahel-Burr's comment. – binW Apr 25 '11 at 18:56
  • @binw: You have, my error. But such important diagnostic information deserved to be in the question rather than a coment. – Clifford Apr 25 '11 at 20:04

5 Answers5

4

Since you have optimisations on, and in some environments it crashes, it may be that your code uses undefined behaviour or other latent error. Such behaviour can change with optimisation, or even break altogether.

I suggest that you try both tool-chains without optimisation, and make sure that the warning level is set high, and you fix them all. GCC is far better that armcc at error checking so is a reasonable static analysis check. If the code builds clean it is more likely to work and may be easier for the optimiser to handle.

Clifford
  • 88,407
  • 13
  • 85
  • 165
3

Have you tried removing the '--no_unaligned_access'? ARM11s can typically do unaligned access (if enabled in the startup code) and forcing the compiler/library to not do them may be slowing down your code.

2

The current version of RVCT says of '--fpu=SoftVFP':

In previous releases of RVCT, if you specified --fpu=softvfp and a CPU with implicit VFP hardware, the linker chose a library that implemented the software floating-point calls using VFP instructions. This is no longer the case. If you require this legacy behavior, use --fpu=softvfp+vfp.

This suggests to me that if you perhaps have an old version of RVCT the behaviour will be to use software floating point regardless of the presence of hardware floating point. While in the GNU version -msoft-float will use hardware floating point instructions when an FPU is available.

So what version of RVCT are you using?

Either way I suggest that you remove the --fpu option since the compiler will make an implicit appropriate selection based on the --cpu option selected. You also need to correct the CPU selection, your RVCT option says --cpu=ARM1136J-S not ARM1136FJ-S as you told GCC. This will no doubt prevent the compiler from generating VFP instructions, since you told it it has no VFP.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • As I have mentioned above, I am using RVDS version 4.1 which is the latest I believe. – binW Apr 25 '11 at 18:58
  • 1
    Ok, then it is probably the error in the cpu spec that is causing the problem, scince you have told the RV compiler that you have no floating point hardware, but allowed GCC to use h/w fp. – Clifford Apr 25 '11 at 20:03
1

The same source code can produce dramatically different binaries due to factors like. Different compilers (llvm vs gcc, gcc 4 vs gcc3, etc). Different versions of the same compiler. Different compiler options if the same compiler. Optimization (on either compiler). Compiled for release or debug (or whatever terms you want to use, the binaries are quite different). When going embedded, you add in the complication of a bootloader or rom monitor (debugger) and things like that. Then add to that the host side tools that talk to the rom monitor or compiled in debugger. Despite being a far better compiler than gcc, arm compilers were infected with the assumption that the binaries would always be run on top of their rom monitor. I want to remember that by the time rvct became their primary compiler that assumption was on its way out, but I have not really used their tools since then.

The bottom line is there are a handful of major factors that can affect the differences between binaries that can and will lead to a different experience. Assuming that you will get the same performance or results, is a bad assumption, the expectation is that the results will differ. Likewise, within the same environment, you should be able to create binaries that give dramatically different performance results. All from the same source code.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • I am not assuming that I will get the same performance. The assumption is that I should get better performance with Realview compiler than GNU because this compiler is produced by ARM it self. – binW Apr 25 '11 at 07:03
  • I dont have much access to arm tools anymore, when I did, the arm tools produced much tighter and faster code than gcc. Considerably better. gcc has gotten better since, but not that much better, and gcc 4 is not necessarily better than gcc 3 for performance. So with caution I would say yes you should expect better code from the arm tools. You need to make sure you experiment with the compiler options. Using a debugger with debuggable code you are probably not taking full advantage of the optimizer. Using the debugger just to load and run is a different story. – old_timer Apr 25 '11 at 13:41
0

Do you have compiler optimizations turned on in your CodeSourcery build, but not in the Realview build?

Judge Maygarden
  • 26,961
  • 9
  • 82
  • 99