0

While doing a debug for a linux device driver, I met with a situation the OS is killed with some messages. Because it doesn't come to the shell prompt, I can't run dmesg and I just have to restart the machine (it's a arm64 virtual machine). I don't know why sometimes the machine crashes without just killing the program.(most of times, it returns to the shell)
The program died with this message below(just shown a part for this question).

csr seen at app..
0 0 0 f 4225a8 4228a8 443040 0 
[  743.787778] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000442f00
[  743.790613] Mem abort info:
[  743.790955]   ESR = 0x9600000f
[  743.791572]   EC = 0x25: DABT (current EL), IL = 32 bits
[  743.792437]   SET = 0, FnV = 0
[  743.792980]   EA = 0, S1PTW = 0
[  743.793676] Data abort info:
[  743.794119]   ISV = 0, ISS = 0x0000000f
[  743.794625]   CM = 0, WnR = 0
[  743.795318] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000075b0f000
[  743.796458] [0000000000442f00] pgd=000000006fe06003, pud=000000006fe0c003, pmd=000000006fe10003, pte=00e800004f108f53
[  743.799097] Internal error: Oops: 9600000f [#1] SMP
[  743.800477] Modules linked in: axpu_ldd_kc(OE) nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua qemu_fw_cfg sch_fq_codel ppdev lp parport drm ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_ce ghash_ce sm4_ce sm4_generic sm3_ce sm3_generic sha3_ce sha3_generic sha512_ce sha512_arm64 sha2_ce sha256_arm64 sha1_ce virtio_net net_failover failover virtio_blk aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[  743.808604] CPU: 2 PID: 1179 Comm: test_axpu_app Tainted: G           OE     5.4.0-77-generic #86-Ubuntu
[  743.809607] Hardware name: QEMU QEMU Ab21q Virtual Machine, BIOS 0.0.0 02/06/2015
[  743.811097] pstate: 20400005 (nzCv daif +PAN -UAO)
[  743.813870] pc : axpu_ioctl+0x3ec/0xb54 [axpu_ldd_kc]
[  743.814477] lr : axpu_ioctl+0x3cc/0xb54 [axpu_ldd_kc]
[  743.815031] sp : ffff800013733c90
[  743.815511] x29: ffff800013733c90 x28: ffff00002fb04b00 
[  743.816277] x27: 0000000000000000 x26: ffff800009243448 
[  743.816855] x25: 0000000000000000 x24: ffff800009243470 
[  743.817448] x23: ffff000027373000 x22: ffff8000092433d0 
[  743.818144] x21: ffff8000092433d8 x20: ffff000017ab0a40 
[  743.818782] x19: ffff8000092454c0 x18: 0000000000000001 
[  743.819493] x17: 0000000000000000 x16: 0000000000000000 
[  743.820156] x15: ffff00002fb05028 x14: ffffffffffffffff 
[  743.820789] x13: 0000000000000000 x12: ffff800011db3000 
[  743.821422] x11: ffff800011b9e000 x10: 0000000000000000 
[  743.822018] x9 : 0000000000000004 x8 : 0000000000000214 
[  743.822593] x7 : 0000000000000001 x6 : ffff800011db3000 
[  743.823179] x5 : ffff00003fdb5248 x4 : 0000000000000007 
[  743.823857] x3 : 0000000000000000 x2 : 0000000000000008 
[  743.824621] x1 : ffff000023002200 x0 : 0000000000442f00 
[  743.825939] Call trace:
[  743.826397]  axpu_ioctl+0x3ec/0xb54 [axpu_ldd_kc]
[  743.828010]  do_vfs_ioctl+0xc64/0xe60

At the bottom I see the call trace starting and it says it crashed near axpu_ioctl+0x3ec/0xb54. axpu_ioctl is the function name, but what are those two numbers? I can guess one is the offset from the axpu_ioctl but can't understand why there two and which is which..
If I can know the exact code location, it would be very easy to find the source line with problem, but I can't exactly understand the two numbers.

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
Chan Kim
  • 5,177
  • 12
  • 57
  • 112
  • I think the one after the slash is notionally the function size in bytes. (Note that this size will include the bodies of functions that the compiler decided to call inline.) – Ian Abbott Aug 04 '21 at 09:40
  • @IanAbbott Hi, thanks again. I could check for only one function and it looks like it. But there is a slight difference (0x10) in the function size. – Chan Kim Aug 04 '21 at 09:50
  • 2
    The Oops message suggests problems with user memory accesses in the driver. As a first step I would install the *sparse* package and build the driver with `make C=1` (the `C=1` enables "sparse" checking). The various kernel source code pointer tags such as `__user` for user memory pointers and `__iomem` for memory-mapped I/O pointers will be checked for consistent usage. – Ian Abbott Aug 04 '21 at 11:36
  • Those two numbers are: _offset of the trapped instruction_ and _size_ of the function. – 0andriy Aug 05 '21 at 07:11

0 Answers0