Problem
I need a kernel thread that is able to work for prolonged periods of time without yielding, basically fully dedicating a CPU core to it on demand:
int my_kthread(void *arg)
{
while(!kthread_should_stop()) {
do_some_work();
if(sleeping_enabled) msleep(1000);
else {
// What to do here to avoid lockup warnings
// and ensure system stability?
}
}
return 0;
}
Background
The thread is created like this when the module that I am working on is loaded:
my_task = kthread_run(&my_kthread, (void *)some_data, "My KThread")
set_cpus_allowed(my_task, *cpumask_of(10)); // Pin thread to core #10
and stopped like this when the module is unloaded:
kthread_stop(my_task);
Everything works just fine when sleeping_enabled
is true
.
Otherwise, soon after the thread is started, the kernel complains of the apparent lockup. At first, I merely aimed to avoid the various warnings such as
BUG: soft lockup - CPU#10 stuck for 22s!
and
INFO: rcu_sched detected stalls on CPUs/tasks: { 10} (detected by 15, t=30567 jiffies)
since they tend to flood my console with dumps for all >20 cores, and the "lockup" is desired behavior.
I tried poking the watchdog like this:
if(sleeping_enabled) msleep(1000);
else touch_softlockup_watchdog();
in combination with (echo 1 > /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress)
and pretty much got what I want (a never-sleeping thread that successfully does what I want and no spam in the console).
However, not only does this "solution" feel like cheating, it seems I am completely breaking something by hogging that one core: when unloading the module via rmmod
, the whole system freezes. The console starts periodically dumping soft lockups on all cores, with this call trace:
[<ffffffff810c96b0>] ? queue_stop_cpus_work+0xd0/0xd0
[<ffffffff810c9903>] cpu_stopper_thread+0xe3/0x1b0
[<ffffffff8108639a>] ? finish_task_switch+0x4a/0xf0
[<ffffffff8169e654>] ? __schedule+0x3c4/0x700
[<ffffffff81080e98>] ? __wake_up_common+0x58/0x90
[<ffffffff810c9820>] ? __stop_cpus+0x80/0x80
[<ffffffff81077e93>] kthread+0x93/0xa0
[<ffffffff816a9724>] kernel_thread_helper+0x4/0x10
[<ffffffff81077e00>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff816a9720>] ? gs_change+0x13/0x13
Meanwhile, my kernel thread continues running (as evidenced by some console messages that it prints out every now and then), so it never saw kthread_should_stop()
return true
.
Unloading did work correctly and stopped the thread before I switched to not sleeping at all. Now, I am unable to make iterative modifications without having to reboot.
Note that I have simplified the description here a lot. I am trying to add such a thread (to poll some hardware registers and log their changes) to a GPU driver, so there may be module-dependent reasons for the freeze on unload. However, this does not change my general question about how to best implement a thread that never sleeps.