18

I have a processor AT91SAM9G20 running a 2.6 kernel. Watchdog is enabled at bootstrap level and configured for 16 seconds. Watchdog mode register can be configured only once. When code hangs either in bootstrap, bootloader or kernel, the board reboots. But once kernel comes up even though watchdog is not refreshed in any of the applications, the board is not being reset after 16 seconds, but 15 minutes.

Who is refreshing the watchdog?

In our case, the watchdog should be influenced by applications, so that the board can reset if our application hangs.

These are the running processes:

1 root     init
2 root     [kthreadd]
3 root     [ksoftirqd/0]
4 root     [watchdog/0]
5 root     [events/0]
6 root     [khelper]
63 root     [kblockd/0]
72 root     [ksuspend_usbd]
78 root     [khubd]
85 root     [kmmcd]
107 root     [pdflush]
108 root     [pdflush]
109 root     [kswapd0]
110 root     [aio/0]
740 root     [mtdblockd]
828 root     [rpciod/0]
982 root     [jffs2_gcd_mtd10]
1003 root     /sbin/udevd -d
1145 daemon   portmap
1158 dbus     dbus-daemon --system
1178 root     /usr/sbin/ifplugd -i eth0 -fwI -u0 -d5 -l -q
1190 root     /usr/sbin/ifplugd -i eth1 -fwI -u0 -d5 -l -q
1221 default  avahi-daemon: running [SP14.local]
1226 root     /usr/sbin/dropbear
1246 root     /root/bin/host_app
1254 root     /root/bin/mini_httpd -c *.cgi -d /root/bin -u root -E /root/bin/
1256 root     -sh
1257 root     /sbin/syslogd -n -m 0
1258 root     /sbin/klogd -n
1259 root     /usr/bin/tail -f /var/log/messages
1265 root     ps -e

We are using the watchdog for soft lockups available in kernel-2.6.25-ts.at91sam9g20/kernel/softlockup.c

Ender
  • 529
  • 1
  • 3
  • 14
Shashikiran
  • 207
  • 1
  • 3
  • 10

5 Answers5

18

If you enabled the watchdog driver in your kernel, the watchdog driver sets up a kernel timer, in charge of resetting the watchdog. The corresponding code is linux/drivers/watchdog/at91sam9_wdt.c. So it works like this:

If no application opens the /dev/watchdog file, then the kernel takes care of resetting the watchdog. Since it is a timer, it won't appear as a dedicated kernel thread, but handled by the soft IRQ thread. Now, if an application opens this file, it becomes responsible of the watchdog, and can reset it by writing to the file, as documented by the documentation linked in Richard's post.

Is the watchdog driver configured in your kernel? If not, you should configure it, and see if the reset still happens. If it still happens, it is likely that your reset comes from somewhere else.

If your kernel is too old to have a proper watchdog driver (not present in 2.6.25) you should backport it from 2.6.28. Or you can try to disable the watchdog in your bootloader and see if the reset still occurs.

U. Windl
  • 3,480
  • 26
  • 54
shodanex
  • 14,975
  • 11
  • 57
  • 91
  • Watchdog register is configured in bootstrap. Mode register is write once. Where kernel is refreshing watchdog. In our kernel code static void at91_ping(unsigned long data) function is not present. Is it possible to block kernel to stop refreshing? – Shashikiran Jan 07 '10 at 14:09
8

In July 2016 commit 3fbfe92647 (watchdog: change watchdog_need_worker logic) in the 4.7 kernel to watchdog_dev.c enabled the same behavior as shodanex's answer for all watchdog timer drivers. This doesn't seem to be documented anywhere other than this thread and the source code.

/*
* A worker to generate heartbeat requests is needed if all of the
* following conditions are true.
* - Userspace activated the watchdog.
* - The driver provided a value for the maximum hardware timeout, and
*   thus is aware that the framework supports generating heartbeat
*   requests.
* - Userspace requests a longer timeout than the hardware can handle.
*
* Alternatively, if userspace has not opened the watchdog
* device, we take care of feeding the watchdog if it is
* running.
*/

return (hm && watchdog_active(wdd) && t > hm) ||
       (t && !watchdog_active(wdd) && watchdog_hw_running(wdd));
U. Windl
  • 3,480
  • 26
  • 54
Nolan Hergert
  • 141
  • 1
  • 14
6

This may give you a hint: http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt

It makes perfect sense to have a user space daemon handling the watchdog. It probably defaults to a 15 minute timeout.

Richard Pennington
  • 19,673
  • 4
  • 43
  • 72
  • 1
    In user space no watchdog daemon is running. – Shashikiran Jan 07 '10 at 13:23
  • 1
    So there is no message in the log when the system reboots after 15 minutes? That's strange. – Richard Pennington Jan 07 '10 at 13:26
  • 1
    The 9G20 is an embedded processor, and probably has a ramdisk root volume. It may not have a persistent log. – Mike D. Jan 07 '10 at 14:01
  • In /var/log/messages, no such information is available. Only kernel messages are present. – Shashikiran Jan 07 '10 at 14:18
  • Link to API spec provides details on Magic Close and IOCTL that are not in the man pages, which is great. – Donal Lafferty Jan 25 '12 at 16:13
  • 1
    If system is reset by hardware watchdog the kernel cannot write any log messages by design. You can configure early warning for the watchdog reset which will be written to logs. See documentation for details: https://www.kernel.org/doc/html/latest/watchdog/watchdog-api.html – Mikko Rantalainen Aug 16 '22 at 11:28
3

we had a similar problem regarding WDT on AT91SAM9263. Problem was with bit 29 WDIDLEHLT of WDT_MR (Address: 0xFFFFFD44) register. This bit was set to 1 but it should be 0 for our application needs.

Bit explanation from datasheet documentation:

• WDIDLEHLT: Watchdog Idle Halt

  1. 0: The Watchdog runs when the system is in idle mode.
  2. 1: The Watchdog stops when the system is in idle state.

This means that WDT counter does not increment when kernel is in idle state, hence the 15 or more delay until reset happens.

You can try "dd if=/dev/zero of=/dev/null" which will prevent kernel from entering idle state and you should get a reset in 16 seconds (or whatever period you have set in WDT_MR register).

So, the solution is to update u-boot code or other piece of code that sets WDT_MR register. Remember this register is write once...

prix
  • 31
  • 1
0

Wouldn't the kernel be refreshing the watchdog timer? The watchdog is designed to reset the board if the whole system hangs, not just a single application.

mdm
  • 12,480
  • 5
  • 34
  • 53
  • 1
    I searched whole kernel code. I didn't find any such things in kernel. If that is the case why it is taking reset after 15 minutes? – Shashikiran Jan 07 '10 at 13:17
  • The kernel watchdog interface is designed so that you should have exactly one watchdog process which has the job of writing to the watchdog device with interval less than timer expiry time. (If timer expiry time is longer than supported by hardware, the kernel will reset the timer many enough times to allow real hardware reset at the correct time.) You still need user mode watchdog to trigger the timer logic to start. The idea behind the user mode watchdog process is that it can do whatever tests needed before writing and if executing those tests gets too slow, that too will reset the system. – Mikko Rantalainen Aug 16 '22 at 11:26