3

Problem

I need a kernel thread that is able to work for prolonged periods of time without yielding, basically fully dedicating a CPU core to it on demand:

int my_kthread(void *arg)
{
    while(!kthread_should_stop()) {
        do_some_work();
        if(sleeping_enabled) msleep(1000);
        else {
            // What to do here to avoid lockup warnings
            // and ensure system stability?
        }
    }
    return 0;
}

Background

The thread is created like this when the module that I am working on is loaded:

my_task = kthread_run(&my_kthread, (void *)some_data, "My KThread")
set_cpus_allowed(my_task, *cpumask_of(10)); // Pin thread to core #10

and stopped like this when the module is unloaded:

kthread_stop(my_task);

Everything works just fine when sleeping_enabled is true.

Otherwise, soon after the thread is started, the kernel complains of the apparent lockup. At first, I merely aimed to avoid the various warnings such as

BUG: soft lockup - CPU#10 stuck for 22s!

and

INFO: rcu_sched detected stalls on CPUs/tasks: { 10} (detected by 15, t=30567 jiffies)

since they tend to flood my console with dumps for all >20 cores, and the "lockup" is desired behavior.

I tried poking the watchdog like this:

if(sleeping_enabled) msleep(1000);
else touch_softlockup_watchdog();

in combination with (echo 1 > /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress) and pretty much got what I want (a never-sleeping thread that successfully does what I want and no spam in the console).

However, not only does this "solution" feel like cheating, it seems I am completely breaking something by hogging that one core: when unloading the module via rmmod, the whole system freezes. The console starts periodically dumping soft lockups on all cores, with this call trace:

[<ffffffff810c96b0>] ? queue_stop_cpus_work+0xd0/0xd0
[<ffffffff810c9903>] cpu_stopper_thread+0xe3/0x1b0
[<ffffffff8108639a>] ? finish_task_switch+0x4a/0xf0
[<ffffffff8169e654>] ? __schedule+0x3c4/0x700
[<ffffffff81080e98>] ? __wake_up_common+0x58/0x90
[<ffffffff810c9820>] ? __stop_cpus+0x80/0x80
[<ffffffff81077e93>] kthread+0x93/0xa0
[<ffffffff816a9724>] kernel_thread_helper+0x4/0x10
[<ffffffff81077e00>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff816a9720>] ? gs_change+0x13/0x13

Meanwhile, my kernel thread continues running (as evidenced by some console messages that it prints out every now and then), so it never saw kthread_should_stop() return true.

Unloading did work correctly and stopped the thread before I switched to not sleeping at all. Now, I am unable to make iterative modifications without having to reboot.

Note that I have simplified the description here a lot. I am trying to add such a thread (to poll some hardware registers and log their changes) to a GPU driver, so there may be module-dependent reasons for the freeze on unload. However, this does not change my general question about how to best implement a thread that never sleeps.

sls
  • 284
  • 1
  • 7
  • You need to let other things run on that CPU sometimes. Try `schedule()` in the `else` clause. As long as no other CPU-intensive processes are trying to run on this CPU, it will return quickly. – Peter Jul 16 '14 at 21:31
  • @Peter `else schedule()` does fix the system freeze on module unload :) It takes between 5 and 20 μs to execute on my otherwise idle system. However, the registers that I am polling may change a bit faster than that (say once every 3 μs) and I would ideally like to notice every single change. Could there be an even faster way? – sls Jul 17 '14 at 12:09

1 Answers1

0

I think this question is similar to your question "whole one core dedicated to single process" , please check the replies there.

Community
  • 1
  • 1
KarimRaslan
  • 333
  • 2
  • 6
  • I did stumble upon that question (it is not mine), but was looking for a more flexible solution, i.e. to only temporarily hog one core instead of isolating it at boot-time. Maybe there just isn't one... But it did kind of work when I never slept. It's only when I try to unload my module that things completely break. – sls Jul 17 '14 at 12:13
  • @Karim this should be a comment, not an answer – Peter Jul 17 '14 at 13:28
  • @Peter I don't have enough reputation to comment! – KarimRaslan Jul 17 '14 at 15:17