0

from my previsou question Why does module failed to load? (/dev/scull0 : no such device or address) I managed to load the module via /sbin/insmod, but after that, I have log out the dmesg:

[ 2765.707018] scull: loading out-of-tree module taints kernel.
[ 2765.707106] scull: module verification failed: signature and/or required key missing - tainting kernel
[ 2765.707929] Passed scull_init_module at 41 (debug info - successful load of init module)
[ 6027.843914] acer_wmi: Unknown function number - 8 - 1
[ 7347.683312] stack segment: 0000 [#1] SMP PTI
[ 7347.683323] CPU: 3 PID: 15280 Comm: rmmod Tainted: G           OE     4.19.0-9-amd64 #1 Debian 4.19.118-2
[ 7347.683326] Hardware name: Acer Swift SF314-52/Suntory_KL, BIOS V1.08 11/28/2017
/* start of the problem: */
[ 7347.683335] RIP: 0010:scull_trim+0x3a/0xa0 [scull]
[ 7347.683339] Code: 44 8b 77 0c 48 8b 2f 45 8d 66 ff 49 c1 e4 03 48 85 ed 75 16 eb 4b 48 8b 5d 08 48 89 ef e8 7e 38 f1 e1 48 89 dd 48 85 db 74 37 <48> 8b 7d 00 48 85 ff 74 e3 45 85 f6 7e 1a 31 db eb 04 48 83 c3 08

/*... output of all registers ...*/

[ 7347.683372] Call Trace:
[ 7347.683382]  cleanup_module+0x44/0x80 [scull]
[ 7347.683391]  __x64_sys_delete_module+0x190/0x2e0
[ 7347.683399]  do_syscall_64+0x53/0x110
[ 7347.683405]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7347.683530] ---[ end trace c4b4a1cdb428d4b3 ]---
[ 7347.885914] RIP: 0010:scull_trim+0x3a/0xa0 [scull]
... /* again */ ...

Here I can observer, the mess is caused by the scull_trim (source below), and kernel trigger strace to resolve it (or does kernel call Call Trace: when something goes bad in kernel?).

scull_trim:

 /*main structure */
 struct scull_dev {
 struct scull_qset *data; /*quantum repre*/
 int quantum; /*in bytes*/
 int qset; /*array size*/
 unsigned long size; /*total bytes in device*/
 struct cdev cdev; /*Char device structure*/
 };

/*representation of quantum*/
struct scull_qset {
void **data;
struct scull_qset *next;
};

/*-------------------------------------------------------------------------------------*/
int scull_trim(struct scull_dev *dev) {
    struct scull_qset *next, *dptr; /* next for loop, dptr = data pointer (index in loop) */
    int qset = dev->qset; /* get size of arrat */
    int i; /*index for second loop for quantum bytes */

    for(dptr = dev->data /*struct scull_qset*/; dptr ; dptr = next){
        if (dptr->data /*array of quantum*/) {
            for(i=0; i<qset; i++){
                kfree(dptr->data[i]); /*free each byte of array data[i]*/
            }
            kfree(dptr->data); /*free array pointer itself*/
            dptr->data = NULL; /*set array pointer to null pointer to avoid garbage*/
        }
        next = dptr->next;
        kfree(dptr); /* free pointer itself */
    }
    //setting new attributes for cleared dev
    dev->size = 0;
    dev->quantum = scull_quantum;
    dev->qset = scull_qset;
    dev->data = NULL;

    return 0;
}

The function scull_trim is basically from linux device driver, 3 edition, And the function's intend is to get rid of all bytes from the device before open method is called. But why does it caused the dmesg error in that, kernel had to call strace to resolve it?

EDIT: Because it is nearly impossible to resolve the problem, I am adding source (as well as dmesg dump) from github: repo:scull device. Please visit it to resolve the issue.

autistic456
  • 183
  • 1
  • 10
  • What lines come before `[ 7347.683335] RIP: 0010:scull_trim+0x3a/0xa0 [scull]`? – Thomas Jager May 26 '20 at 12:50
  • @ThomasJager editted, But it is not interesting the above information (before the first mention of `scull_trim`, since thay are outputs from other modules) – autistic456 May 26 '20 at 13:13
  • That stuff above still seems like part of a dump after an error. Just because it doesn't mention `scull_trim` doesn't mean that it's not related. Is there any more between those messages? – Thomas Jager May 26 '20 at 13:23
  • No, there is no other message in between. I suspect I have created some kind of loop of allocating memory for the devices, but I cannot resolve it from the code (I do not see where the problem comes from). But the dmesg is full now (from the first mention of the module - `scull_init_module`, until its end via force call of `cleanup_module` – autistic456 May 26 '20 at 13:30
  • "I suspect I have created some kind of loop of allocating memory for the devices, but I cannot resolve it from the code ..." - Have you tried to **debug** the problem? E.g. insert `printk` into `for` loop and print some information at each iteration. The record `scull_trim+0x3a/0xa0` in the trace means that faulted instruction is at `0x3a` offset from the start of `scull_trim` function. Disassembler your module (`objdump -Sdlr .ko > .asm`) and find at least **exact line** in your C code which cause the fault. – Tsyvarev May 26 '20 at 14:59
  • @Tsyvarev, you can do it from `Code:` line. Result for above: `2a:* 48 8b 7d 00 mov 0x0(%rbp),%rdi <-- trapping instruction`. – 0andriy May 26 '20 at 20:12
  • 2
    Do you initialize `dev->data = NULL;` and `dev->size = 0;`? I never saw that in your previous question. – Ian Abbott May 26 '20 at 20:12
  • `kfree(dptr->data[i]); /*free each byte of array data[i]*/` comment here is very suspicious. Are you sure you understand what you are doing? – 0andriy May 26 '20 at 20:19
  • @0andriy: The `Code` line contains only instruction, not a line in a C code. As for the whole code, it is taken from the LDD3 book, and this comment `free each byte of array data[i]` could be from the book too. – Tsyvarev May 26 '20 at 20:49
  • @Tsyvarev, `Code:` contains pretty much enough to assume the issue. – 0andriy May 26 '20 at 21:24
  • @0andriy: "`Code:` contains pretty much enough to assume the issue." - Yes, it contain much info ... for one who could **understand the asm** quite good. I suggested an approach which could be used in debug even without high knowledge of asm. BTW, the decoded instruction is for loading (AT&T) from the (most *likely*) the first structure's field. But the code has several lines which could perform such loading. Yes, you could find more "signs" about possible correspondence to C code. But again, it requires great experience in asm. – Tsyvarev May 26 '20 at 21:52
  • @Tsyvarev, I have added github repo with source and the dmesg error file. Please visit it, to find out: https://github.com/autistic456/scull-device – autistic456 May 27 '20 at 10:43
  • As @IanAbbott noted in the comment, `dev->data` is **never been initialized**. Would you compare your code with [original example of scull driver](https://github.com/martinezjavier/ldd3/blob/master/scull/main.c), you will find the line `memset(scull_devices, 0, scull_nr_devs * sizeof(struct scull_dev));` in the original code. Exactly this line is responsible for zero-initialization of some fields of `scull_dev` structure. – Tsyvarev May 27 '20 at 12:02
  • @IanAbbott, why is needed to memset the memory from kmalloc (I know -> the memory is unitilized and have undeterminate value), but in the next step, I am assigning to each memory element (typeof struct scull_dev) its values -> and thus initializing it. Why is neede the pre-step of zeroing the garbage, when after that I am assigning (in the for loop `for(i=0;i I fix that -> look at repo: https://github.com/autistic456/scull-device – autistic456 May 31 '20 at 11:18
  • @autistic456 You are not assigning `scull_devices[i].data = NULL;` in that `for` loop. Its value will be indeterminate when `scull_trim()` or `scull_follow()` is called. – Ian Abbott May 31 '20 at 17:19
  • @autistic456 My previous comment refers to the code before you added the `memset()` call. The `memset` will set `scull_devices[i].data` to `NULL` so you don't need to assign it explicitly. – Ian Abbott May 31 '20 at 17:28

0 Answers0