8

I'm struggling with this problem for more than a week now, and still can't find a solution...

I'm trying to cross-compile Qt 4.7 embedded open-source version for an ARM device. The build process itself completes without problems, but the generated binaries seem to contain instructions that the processor does not understand.

  • Build host is Debian 5 (Etch) on i386 (running on a virtual PC)
  • The device is a Trimble Nomad handheld with an ARM processor (see full cpuinfo and kernel configuration)
  • I use the original build toolchain that was made for the device and that worked fine to date (even could build Gnash successfully) - see compiler settings and version
  • I'm using a custom qmake.conf based on linux-arm-gnueabi-g++ and adapted to use the correct toolchain - see source code here
  • I had a partial improvement by adding -msoft-float -D__GCC_FLOAT_NOT_NEEDED to the compiler flags but I still get "Illegal instruction" errors in some situations (but at least this was a big improvement)
  • The binaries themselves basically work, but in certain situations the program crashes with the "Illegal instruction" error. I believe this happens during certain floating point operations while doing graphics stuff.
  • Adding -mcpu=xscale, -march=armv4, -O0, -march=armv4, -mtune=arm920t (not all at the same time) did not help in any way.
  • Building Qt with the --debug flag appears to resolve all problems but adding the -O2 flag reintroduces them. Strangely the -O0 setting without --debug does not help.
  • The compilete configure and make output can be seen here. There are lots of alignment warnings but they are said to be false warnings of the compiler.
  • there must have been some change in Qt 4.7.2 because earlier versions (4.7.1, 4.7.0) do run fine.

configure settings:

  ./configure \
    -embedded arm \
    -xplatform qws/linux-arm-angstrom-gnueabi-g++ \
    -debug \
    -no-largefile \
    -no-multimedia \
    -no-audio-backend \
    -no-phonon \
    -no-phonon-backend \
    -webkit \
    -javascript-jit \
    -no-xshape \
    -no-xvideo \
    -no-xsync \
    -no-xinerama \
    -no-xcursor \
    -no-xfixes \
    -no-xrandr \
    -no-xrender \
    -no-xinput \
    -no-xkb \
    -no-opengl \
    -nomake docs \
    -nomake examples \
    -nomake tools \
    -nomake demos \
    -nomake translations \
    -opensource \
    -qt-mouse-tslib \
    -qt-libjpeg \
    -qt-gif 

strace before the crash:

$ LD_LIBRARY_PATH=. QT_QWS_FONTDIR=$PWD/fonts QT_PLUGIN_PATH=$PWD/plugins QWS_MOUSE_PROTO=tslib:/dev/input/touchscreen0 strace ./digitalclock  -qws test.htm
...
lseek(15, 0, SEEK_END)                  = 16998
write(15, "\t\n\f\0\367\t", 6)          = 6
write(15, "\0\0+\234\325\343\306{\3\0\0\0\0J\370\377\351\301\336\377"..., 120) = 120
lseek(15, 0, SEEK_END)                  = 17124
write(15, "\10\10\10\0\371\10", 6)      = 6
write(15, "\0\6j\251\260\201\27\0\2\276\377\351\334\377\346\32K\377"..., 64) = 64
lseek(15, 0, SEEK_END)                  = 17194
write(15, "\7\10\10\0\371\7", 6)        = 6
write(15, "\0\4c\245\263\224 \0\1\271\377\367\315\356P\0I\377\364"..., 64) = 64
lseek(15, 0, SEEK_END)                  = 17264
write(15, "\10\n\10\1\366\10", 6)       = 6
write(15, "\37 \3\0\0\0\0\0\374\377\34\0\0\0\0\0\374\377\34\0\0\0"..., 80) = 80
fcntl64(15, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
lseek(15, 0, SEEK_END)                  = 17350
mremap(0x415f5000, 16552, 17350, MREMAP_MAYMOVE) = 0x415f5000
--- SIGILL (Illegal instruction) @ 0 (0) ---
rt_sigaction(SIGILL, {SIG_DFL}, {0x401b7d34, [ILL], SA_RESTART|0x4000000}, 8) = 0
socket_subcall(0x1f8004, 0, 0x100, 0, 0, 0x18844, 0x18840, 0x12c) = 0
ioctl(12, KDSKBMODE, 0x2)               = 0
ioctl(12, SNDCTL_TMR_START or TCSETS, {B38400 -opost -isig -icanon -echo ...}) = 0
close(12)                               = 0
ioctl(10, KDSETMODE, 0x1)               = 0
write(10, "\33[9;15]\33[?33h\33[?25h\33[?0c\0", 25) = 25
close(10)                               = 0
statfs64(umovestr: Input/output error
0x6d4f, 27983, {???})          = 0
sigreturn()                             = ? (mask now [ILL ABRT BUS FPE USR1 SEGV USR2 PIPE STKFLT CHLD CONT STOP TTOU URG XCPU VTALRM PROF WINCH IO PWR RTMIN])
--- SIGILL (Illegal instruction) @ 0 (0) ---
+++ killed by SIGILL +++
Process 27983 detached

gdb backtrace of the crash (I'm missing debug symbols since compiling with debug information resolves the problem):

(gdb) run -qws
Starting program: /home/.qt-test2/digitalclock -qws

Program received signal SIGILL, Illegal instruction.
0x4130268c in __sigsetjmp () from /lib/libc.so.6
(gdb) bt
#0  0x4130268c in __sigsetjmp () from /lib/libc.so.6
#1  0x4046ee5c in ?? () from ./libQtGui.so.4
(gdb)

Note the device comes with Qtopia 4.3 preinstalled and the vendor can't explain the problem with my build either.


Update

With help from Igor Skochinsky I could find the exact assembler instruction that is causing the SIGILL. For some reason the instruction works fine for 47 times before causing the error. See gdb output below (note I'm not familiar with ARM assembler at all ):

$ LD_LIBRARY_PATH=. QT_QWS_FONTDIR=$PWD/fonts QT_PLUGIN_PATH=$PWD/plugins QWS_MOUSE_PROTO=tslib:/dev/input/touchscreen0 gdb ./digitalclock
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "arm-angstrom-linux-gnueabi"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) start -qws
Breakpoint 1 at 0xaa58: file main.cpp, line 47.
Starting program: /home/.qt-test2/digitalclock -qws
[Thread debugging using libthread_db enabled]
[New Thread 1073870720 (LWP 2799)]
[Switching to Thread 1073870720 (LWP 2799)]
main (argc=2, argv=0xbea17d04) at main.cpp:47
47      main.cpp: No such file or directory.
        in main.cpp
(gdb) display/i $pc
1: x/i $pc  0xaa58 <main+24>:   sub     r3, r11, #28    ; 0x1c
(gdb) display/x $r2
2: /x $r2 = 0xbea17d10
(gdb) display/x $f2
3: /x $f2 = 0x0
(gdb) b *0x41302684
Breakpoint 2 at 0x41302684
(gdb) continue
Continuing.

---> no problem here:

Breakpoint 2, 0x41302684 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc  0x41302684 <__sigsetjmp+52>:        beq     0x413026a0 <Lno_iwmmxt>
(gdb) si
0x41302688 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc  0x41302688 <__sigsetjmp+56>:        stfp    f2, [r12], #8
(gdb) si
0x4130268c in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc  0x4130268c <__sigsetjmp+60>:        stfp    f3, [r12], #8
(gdb) si
0x41302690 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc  0x41302690 <__sigsetjmp+64>:        stfp    f4, [r12], #8
(gdb) continue
Continuing.

Breakpoint 2, 0x41302684 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc  0x41302684 <__sigsetjmp+52>:        beq     0x413026a0 <Lno_iwmmxt>
(gdb) continue 46
Will ignore next 45 crossings of breakpoint 2.  Continuing.


---> __sigsetjmp still working fine, but then:


Breakpoint 2, 0x41302684 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc  0x41302684 <__sigsetjmp+52>:        beq     0x413026a0 <Lno_iwmmxt>
(gdb) si
0x41302688 in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc  0x41302688 <__sigsetjmp+56>:        stfp    f2, [r12], #8
(gdb) si

Program received signal SIGILL, Illegal instruction.
0x4130268c in __sigsetjmp () from /lib/libc.so.6
3: /x $f2 = 0x0
2: /x $r2 = 0x293
1: x/i $pc  0x4130268c <__sigsetjmp+60>:        stfp    f3, [r12], #8

Any suggestions what I could try next?

Udo G
  • 12,572
  • 13
  • 56
  • 89
  • Have you tried disassembling the suspect code from GDB? Perhaps you might be able to spot the instruction that causes the issue. Also have you tried `-O0 -g`? If you reproduce the problem with these options, GDB might produce more useful output. – thkala Apr 11 '11 at 12:56
  • how do I do that? I am not familiar with disassembling.. – Udo G Apr 11 '11 at 12:57
  • @Udo: GDB has a `disassemble` command – thkala Apr 11 '11 at 12:59
  • @thkala: See output [here](http://pastebin.com/g6L7VqUX). does this tell something to you? Also, I'm currently rebuilding with `-O0 -g` (and not `--debug`) as suggested. This will take about half an hour. – Udo G Apr 11 '11 at 13:10
  • nvm, see my answer below – Igor Skochinsky Apr 11 '11 at 13:11
  • @Igor: here it is: http://pastebin.com/Bd4LqqU9 – Udo G Apr 11 '11 at 13:19

2 Answers2

4

The posted disassembly is quite interesting.

 0x41302678 <__sigsetjmp+40>:    fmrx    r2, fpscr
 0x4130267c <__sigsetjmp+44>:    str     r2, [r12], #4
 0x41302680 <__sigsetjmp+48>:    tst     r2, #512        ; 0x200
 0x41302684 <__sigsetjmp+52>:    beq     0x413026a0 <__sigsetjmp+80>
 0x41302688 <__sigsetjmp+56>:    stfp    f2, [r12], #8
*0x4130268c <__sigsetjmp+60>:    stfp    f3, [r12], #8*
 0x41302690 <__sigsetjmp+64>:    stfp    f4, [r12], #8
 0x41302694 <__sigsetjmp+68>:    stfp    f5, [r12], #8
 0x41302698 <__sigsetjmp+72>:    stfp    f6, [r12], #8
 0x4130269c <__sigsetjmp+76>:    stfp    f7, [r12], #8

The code checks for bit 9 in fpscr, and, if set, tries to save registers f2-f7. What are those? I've never seen them in recent processors, but I think those are FPA ("Floating Point Accelerator") registers, implemented in a few old cores, and used for soft FP before VFP appeared.

So, here's what I think happens:

  1. The libc on your device was compiled with FPA support, probably by mistake.
  2. In FPA processors bit 9 meant "FPA enabled" or something similar
  3. In the debug version of Qt the bit 9 of FPSCR (DZE = Division by Zero exception enable bit) is not set, so they don't try to save FPA registers. However, it gets set in the release version.

I see here two options:

  1. Rebuild libc without FPA support
  2. Find where DZE gets set in the release ver (not sure how to do that)

Update: I was wrong. The gdb disassembly confused me. I found the source of setjmp.S, here's the relevant part:

        tst     a3, #HWCAP_ARM_VFP
        beq     Lno_vfp

        /* Store the VFP registers.  */
        /* Following instruction is fstmiax ip!, {d8-d15}.  */
        stc     p11, cr8, [r12], #68
        /* Store the floating-point status register.  */
        /* Following instruction is fmrx r2, fpscr.  */
        mrc     p10, 7, r2, cr1, cr0, 0
        str     r2, [ip], #4
Lno_vfp:

        tst     a3, #HWCAP_ARM_IWMMXT
        beq     Lno_iwmmxt

        /* Save the call-preserved iWMMXt registers.  */
        /* Following instructions are wstrd wr10, [ip], #8 (etc.)  */
        stcl    p1, cr10, [r12], #8
        stcl    p1, cr11, [r12], #8
        stcl    p1, cr12, [r12], #8
        stcl    p1, cr13, [r12], #8
        stcl    p1, cr14, [r12], #8
        stcl    p1, cr15, [r12], #8
Lno_iwmmxt:

So, it's trying to store WMMXt registers, not FPA. However, there is a bug here. It's using r2 to temporarily store fpscr, but that ovewrites the previously loaded hwcap value in a3 (a3 is the APCS name for r2). Maybe the author meant to use a2, not r2, or maybe the two parts were done by different people. In either case, somehow the release version of Qt changes FPSCR (which is most likely emulated by the kernel) and the code storing iwmmxt regs is triggered.

Still, that's not the whole story. The hwcaps you pasted claim that the CPU does support iWMMXt, so I'm not sure why those instructions would be giving trouble. Maybe the reported PC value is wrong somehow. I think you should try putting breakpoint on __sigsetjmp and stepping through it by instruction (stepi), to see where exactly it crashes.

Igor Skochinsky
  • 24,629
  • 2
  • 72
  • 109
  • Is there a way to *test* whether the libc is okay or not? [This test program](http://pastebin.com/0d5QLwbU) runs fine when compiled with the same compiler switches (including `-O2`). And, I assume for option #2 that there must be some assembler code in the Qt source. What expression can I grep to find potentially matching source code files containing asm code? – Udo G Apr 11 '11 at 13:35
  • Here is the gdb `stepi` output: http://pastebin.com/A53Jkh7S I hope this tells you something... Let me know whan I can do next. – Udo G Apr 11 '11 at 14:52
  • @Udo: Looks like it passes fine the first time. Let's see if we can trigger it. Type this in gdb: `display/i $pc` `display/x $r2`.Then run the program again, step through __sigsetjmp (btw you can use 'si'), but continue once you reach 0x413026a0 (`b 0x413026b0`). – Igor Skochinsky Apr 11 '11 at 16:24
  • Here is the output from what you requested: http://pastebin.com/eZ5UXT7Z however I never reached the address you specified. Instead, I don't get the SIGILL while stepping. That is probably because I "accidentally" installed the glibc*-dbg* package (there is a *ipkg* repository for my device) and this may have changed things. However, I still get the "Illegal instruction" and here you can see the updated output from the previous tests: http://pastebin.com/HUWcM6Zn I hope this gives you some more hints anyway? – Udo G Apr 11 '11 at 16:52
  • Hrm, looks like with the new glibc the address is different. Can you try continuing when you reach this line: `0x40013400 : b 0x40012430 <__sigjmp_save>`? – Igor Skochinsky Apr 11 '11 at 17:21
  • Apparently it keeps jumping back to the __sigsetjmp() function: http://pastebin.com/Y5VMX1jr (I've also included a backtrace after three trials). – Udo G Apr 11 '11 at 17:31
  • Could it be I'm stopping at the wrong __sigsetjmp() function? gdb sets the breakpoint at 0x400133b0 but the backtrace (after SIGILL, without having any breakpoints set) shows address 0x4130268c: http://pastebin.com/WwHbR9hZ (EDIT: corrected paste) – Udo G Apr 11 '11 at 17:36
  • It just means the first three calls to __sigsetjmp went fine... Can you do a run with new glibc but disable the breakpoint? The previous paste with SIGILL still had old addresses... – Igor Skochinsky Apr 11 '11 at 17:36
  • I think I did that already. See my previous comment (http://pastebin.com/WwHbR9hZ). – Udo G Apr 11 '11 at 17:37
  • Indeed, I didn't notice that the library name is different! The first function comes from ld-linux.so.3 and the second from libc.so.6. Please try setting the breakpoint by address: `b *0x41302650` - that should stop only in libc (the one that exhibits the problem). – Igor Skochinsky Apr 11 '11 at 18:10
  • I hope this is what you need: [http://pastebin.com/F2HS4t0m](http://pastebin.com/F2HS4t0m) (note this time I [used the `start` command](http://stackoverflow.com/questions/912808/after-setting-a-breakpoint-in-qt-gdb-says-error-accessing-memory-address) to avoid errors. I couldn't see the SIGILL even after a few calls to that function as you can see. – Udo G Apr 11 '11 at 18:35
  • Could it help switching back to the old glibc? – Udo G Apr 11 '11 at 19:06
  • I don't expect it will... the addresses look the same, so probably dbg version only added symbols. – Igor Skochinsky Apr 11 '11 at 19:50
  • I could isolate the SIGILL in the 48th call to `__sigsetjmp` and I think I got the `display`... data you need. Please have a look at this: http://pastebin.com/8bpw4Cvt Thanks – Udo G Apr 12 '11 at 09:19
  • Okay, so it seems it *does* crash on the wstrd instruction after all... but it worked fine the first few times?! I wonder if somehow the iWMMXt instructions get disabled... – Igor Skochinsky Apr 12 '11 at 09:42
  • After some digging through the kernel sources, it seems that iWMMXt access (cp0 and cp1) is enabled per task in the undefined opcode trap, and then re-enabled on each task switch. So it would seem that somehow the TIF_USING_IWMMXT flag is cleared from the task flags. I tried looking where it could happen but it seems it's only changed on the ELF load... – Igor Skochinsky Apr 12 '11 at 10:12
  • Okay, I have another idea. Maybe one of the .so that gets loaded into the process is not iWMMXt compatible (i.e. doesn't have EABI or SOFT_FLOAT flag set), and this makes the kernel disable TIF_USING_IWMMXT. Can you print the list of solibs loaded (`i shared`) after the crash and check them with readelf? I could not find a way to make gdb print the .so name when it gets loaded, that would make things simpler (you could just check the last loaded one). – Igor Skochinsky Apr 12 '11 at 10:26
  • See output here: http://pastebin.com/n5kAYft7 I have no `readelf` on the device so I copied the .so files to the host and used it's `readelf`. All have the same flags, but I can't find anything related to SOFT_FLOAT or FLOAT in general. In case it helps you I have copied all binaries and the `readelf -a` output to http://indunet.it/temp/qt7-problem/ Please note also the ArmEabiPort wiki note I added at the end of the paste mentioned above. Let me know what else can I do.. – Udo G Apr 12 '11 at 12:32
  • Well, it seems all the .so files are EABI after all so there goes my theory. I guess the only remaining way to get down to the heart of the problem would be to use kernel debugger to try and figure out who's changing the task flags (if that's indeed the issue) :/ – Igor Skochinsky Apr 12 '11 at 15:04
  • uhm, sadly, I have no clue how to do that. well, thanks *a lot* anyway for all your effort! I'll open a bounty to this one as I really need to fix this. Hopefully someone has a fresh idea.. – Udo G Apr 15 '11 at 10:00
  • FYI, I tried with Qt 4.7.0 (instead of 4.7.2) and that version works fine. But frankly I don't know where to start looking for differences... – Udo G Apr 18 '11 at 16:25
0

Hello I had similar problem few days ago... But I run Qt Creator 5.7 on my Slackware Linux in VMware player (not ARM device).
After successful installation I could not start Qt Creator. I tried to run Qt Creator with the following terminal command /opt/Qt5.7.0/Tools/QtCreator/bin/qtcreator and it gave me an error Illegal instruction.
After few hours spent with Google, I tried to run Qt Creator with this terminal commands /opt/Qt5.7.0/Tools/QtCreator/bin/qtcreator -noload Welcome and this worked for me.

Hope this helps someone. Sorry for late response.

CaptainBli
  • 4,121
  • 4
  • 39
  • 58
tomas4
  • 1