I have a small program running on Linux (on an embedded PC, dual-core Intel Atom 1.6GHz with Debian 6 running Linux 2.6.32-5) which communicates with external hardware via an FTDI USB-to-serial converter (using the ftdi_sio
kernel module and a /dev/ttyUSB*
device). Essentially, in my main loop I run
clock_gettime()
usingCLOCK_MONOTONIC
select()
with a timeout of 8 msclock_gettime()
as before- Output the time difference of the two
clock_gettime()
calls
To have some level of "soft" real-time guarantees, this thread runs as SCHED_FIFO
with maximum priority (showing up as "RT" in top
). It is the only thread in the system running at this priority, no other process has such priorities. My process has one other SCHED_FIFO
thread with a lower priority, while everything else is at SCHED_OTHER
. The two "real-time" threads are not CPU bound and do very little apart from waiting for I/O and passing on data.
The kernel I am using has no RT_PREEMPT patches (I might switch to that patch in the future). I know that if I want "proper" realtime, I need to switch to RT_PREEMPT or, better, Xenomai or the like. But nevertheless I would like to know what is behind the following timing anomalies on a "vanilla" kernel:
- Roughly 0.03% of all
select()
calls are timed at over 10 ms (remember, the timeout was 8 ms). - The three worst cases (out of over 12 million calls) were 31.7 ms, 46.8 ms and 64.4 ms.
- All of the above happened within 20 seconds of each other, and I think some cron job may have been interfering (although the system logs are low on information apart from the fact that
cron.daily
was being executed at the time).
So, my question is: What factors can be involved in such extreme cases? Is this just something that can happen inside the Linux kernel itself, i.e. would I have to switch to RT_PREEMPT, or even a non-USB interface and Xenomai, to get more reliable guarantees? Could /proc/sys/kernel/sched_rt_runtime_us
be biting me? Are there any other factors I may have missed?
Another way to put this question is, what else can I do to reduce these latency anomalies without switching to a "harder" realtime environment?
Update: I have observed a new, "worse worst case" of about 118.4 ms (once over a total of around 25 million select()
calls). Even when I am not using a kernel with any sort of realtime extension, I am somewhat worried by the fact that a deadline can apparently be missed by over a tenth of a second.