Limits of (soft real-time) timing requirements in Windows OS

Question

In the company I work for we build machines which are controlled by software running on Windows OS. A C# application communicates with a bus controller (via a DLL). The bus controller runs on a tact time of 15ms. That means, that we get updates of the actual sensors in the system with a heart beat of 15ms from the bus controller (which is real time).

Now, the machines are evolving into a next generation, where we get a new bus controller which runs on a tact of 1ms. Since everybody realizes that Windows is not a real time OS, the question arises: should we move the controlling part of the software to a real time application (on a real time OS, e.g. a (soft) PLC).

If we stay on the windows platform, we do not have guaranteed responsiveness. That on itself is not necessarily a problem; if we miss a few bus cycles (have a few hickups), the machine will just produce slightly slower (which is acceptable).

The part that worries me, is Thread synchronization between the main machine controlling thread, and the updates we receive from the real time controller (every millisecond).

Where can I learn more about how Windows / .NET C# behaves when it goes down the path of thread synchronization on milliseconds? I know that e.g. Thread.Sleep(1) can take up to 15 ms because Windows is preempting other tasks, so how does this reflect when I synchronize between two threads with Monitor.PulseAll every ms? Can I expect the same unpredictable behavior? Is it asking for trouble when I am moving into the soft real time requirements of 1ms in Windows applications?

I hope somebody with experience on these aspects of threading can shed some light on this. If I need to clarify more, by all means, shoot.

Are you using a separate computer linked to your machine, or are you using a built in processor ("windows computer") integrated into your machine? — ErstwhileIII, Jul 19 '14 at 14:26
@ErstwhileIII The machine contains a bunch of computers, where 1 computer is the "systemPC" which is a PC that runs on Windows 7 embedded. This PC is connected to the bus controller (which effectively is just a PCI express card in that same computer) which runs a real time OS. — bas, Jul 19 '14 at 14:29
There are also some "real time extensions" to Windows OS that may also be useful for you to explore. — ErstwhileIII, Jul 19 '14 at 14:32
@ErstwhileIII, Yes I have been told :). An example of that's the soft PLC direction where one processor core is dedicated for a real time application (RTOS eg). Then still I am wondering how "real time" Windows can get (or how unreliable / unpredictable it will really be). — bas, Jul 19 '14 at 14:40
http://stackoverflow.com/questions/6206305/why-is-windows-not-considered-suitable-for-real-time-systems-high-performance-se — Yuval Itzchakov, Jul 19 '14 at 14:47
Do you have a "test" setup that you can drive with an emulated load ... so you can do engineering measurements with your current setup to see what responsiveness you can sustain? — ErstwhileIII, Jul 19 '14 at 14:48
@ErstwhileIII , well that I already have, but the problem with that is that the "real time behavior" is emulated all the same. The only way to really measure it, is with a scope and connected to the new bus controller (but that machine only exists on paper). The tests we've done so far is that "now and then" we miss a bus cycle, so that "test" doesn't raise too many red flags. Hence, my mixed feelings and this question on SO :) — bas, Jul 19 '14 at 15:00
I suspect your first bottleneck in your setup will be the GC and not the OS. I'm afraid ANY GC will be non-deterministic and may lead to issues with Real Time processing. http://stackoverflow.com/questions/1031512/recommended-net-soft-real-time?rq=1 — Aron, Jul 19 '14 at 15:02
@bas hence the importance of building testable systems. Might I suggest you rig up an FPGA to emulate the output of your bus controller? — Aron, Jul 19 '14 at 15:04
@Aron, I realize we'll never get a real time system on a windows installation, and also the GC is not helping (although, consider a C++ application where dynamic memory is allocated, also not a deterministic process). So I am not really scared of the 'incredible optimized GC'. There might be a fair amount of C/C++ applications that do not have memory management but still suffer from poor constructions which have larger impact than the GC on a well written C# application. The FPGA is a very good remark, thanks for that. We'll take that into consideration to achieve a decent test env. thx! — bas, Jul 19 '14 at 15:10
@dialer: Yes, certainly in terms of time to allocate memory. You can put a hard upper limit on the amount of time that takes to happen. Sometimes it will happen faster if the heap is able to optimize the allocation, but when doing real-time planning you just assume it'll take the maximum possible time and you have entirely deterministic behavior. — Billy ONeal, Jul 19 '14 at 15:57
@dialer: In contrast, garbage collection happens in the background outside of programmer control, and can take an unbounded amount of time to complete. — Billy ONeal, Jul 19 '14 at 15:59
@BillyONeal I stand corrected. What I had in my head is that the time to heap allocate isn't constant, which I agree is of no matter in this case (GC behavior was out of question anyway). — dialer, Jul 19 '14 at 16:09
It sounds to me like you should be doing more of the lower-level protocol in the driver and not trying to signal threads for a few bytes every 15/1 ms. — Martin James, Jul 20 '14 at 03:44
Even if you cannot move more of the protocol into a driver, making a thread ready/running from a driver, (eg. by signaling a semaphore), should take much less than 1ms on a box that is not overloaded. That is how the I/O system works. — Martin James, Jul 20 '14 at 03:47

score 1 · Accepted Answer · answered Jul 19 '14 at 16:57

Your scenario sounds like a candidate for a kiosk-mode/dedicated application.

In the company I work for we build machines which are controlled by software running on Windows OS.

If so, you could rig the machines such that your low-latency I/O thread could run on a dedicated core with thread and process priorities maximized. Furthermore, ensure the machine has enough cores to handle a buffering thread as well as any others that process your data in transit. The buffer should allocate memory upfront if possible to avoid garbage collection bottlenecks.

@Aron's example is good for situations where data integrity can be compromised to a certain extent. In audio, latency matters a lot during recording for multiple reasons but for pure playback, data loss is acceptable to a certain degree. I am assuming this is not an option in your case.

Of course Windows is not designed to be a real-time OS but if you are using it for a dedicated app, you have control over every aspect of it and can turn off all unrelated services and background processes.

I have had a reasonable amount of success writing software to monitor how well UPS units cope with power fluctuations by measuring their power compensation response times (disclaimer: not for commercial purposes though). Since the data to measure per sample was very small, the GC was not problematic and we cycled pre-allocated memory blocks for buffers.

Some micro-optimizations that came in handy:

Using immutable structs to poll I/O data.
Optimizing data structures to work well with memory allocation.
Optimizing processing algorithms to minimize CPU cache misses.
Using an optimized buffer class to hold data in transit.
Using the Monitor and Interlocked classes for synchronization.
Using unsafe code with (void*) to gain easy access to buffer arrays in various ways to decrease processing time. Minimal use of Marshal and Buffer.BlockCopy.

Lastly, you could go the DDK way and write a small driver. Albeit off-topic, DFMirage is a good example of a video driver that provides both an event-based and a polling model for differential screen capture such that the consumer application can chose on-the-fly based on system load.

As for Thread.Sleep, you could use it as sparingly as possible considering your energy consumption boundaries. With redundant processes out of the way, Thread.Sleep(1) should not be as bad as you think. Try the following to see what you get. Note that this has been coded in the SO editor so I may have made mistakes.

Thread.CurrentThread.Priority = ThreadPriority.Highest;
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;

var ticks = 0L;
var iteration = 0D;
var timer = new Stopwatch();

do
{
    iteration++;
    timer.Restart();
    Thread.Sleep(1);
    timer.Stop();
    ticks += timer.Elapsed.Ticks;

    if (Console.KeyAvailable) { if (Console.ReadKey(true).Key == ConsoleKey.Escape) { break; } }

    Console.WriteLine("Elapsed (ms): Last Iteration = {0:N2}, Average = {1:N2}.", timer.Elapsed.TotalMilliseconds, TimeSpan.FromTicks((long) (ticks / iteration)).TotalMilliseconds);
}
while (true);

Console.WriteLine();
Console.WriteLine();
Console.Write("Press any key to continue...");
Console.ReadKey(true);

That's very helpful! Thx! "you could rig the machines such that your low-latency I/O thread could run on a dedicated core with thread and process priorities maximized". Are you referring to a soft-PLC like solution where an application in Windows 'steals' one core to use as in a hosted real time process? — bas, Jul 19 '14 at 17:43
@bas: You're welcome. No, I did not any simulators, emulators or third-party solutions. Setting thread and process priority to high and disabling all auto-update apps got me the millisecond accuracy I was looking for. Since it was a hobby project, `Thread.Sleep` was also sparingly used allowing at least one core to run at 100% even when polling intervals could be relaxed. — Raheel Khan, Jul 19 '14 at 18:01
Another important aspect was pre-allocating memory such that the GC had nothing to run after. As the polled data was transferred to lower priority threads (such as post-process and UI), latency did not matter any more. — Raheel Khan, Jul 19 '14 at 18:03
PS: your "SO editor code" works like a charm :). I copied the `PriorityClass = ProcessPriorityClass.RealTime` to the controlling software, did not have that line yet. Good catch. Thx again, this post definitely helps, good advise — bas, Jul 19 '14 at 18:40

score 0 · Answer 2 · edited May 23 '17 at 11:50

0

Come to think about the actual problem itself, processing data at 1ms is pretty easy. When considering audio recording, as an analogous (pun not intended) problem, you might be able to find some inspiration in how to achieve your goals.

Bear in mind.

Even a modest setup can achieve 44.1kHz@16bit per channel sampling rate (that is about 22microseconds or less than a hundredth of your target).
Using ASIO you can achieve sub 10ms latencies
Most methods of achieving high sampling rates will work by increasing your buffer size and sending data to your system in batches
To achieve the best throughput, don't use threads. You DMA and interrupts to callback your processing loop.

Given that sound cards routinely can achieve your goals, you might have a chance.

edited May 23 '17 at 11:50

Community

1
1

answered Jul 19 '14 at 15:19

Aron

15,464
3
31
64

1

That's not really a good example though imho. Audio processing is always buffered. The CPU is only involved when the buffer is full and has to react in time before the DMA or similar mechanism needs to reload again. Without ASIO or similar, that's typically in the 100ms area. With ASIO, you can go sub 10ms, but if the CPU happens to be busy and not be able to respond in time, you'll hear crackles. – dialer Jul 19 '14 at 15:52
Of course it's buffered, else the whole lot would get bogged down in continual system calls to shift a few bytes from some audio/video card to user buffers all the time and nothing would get done. – Martin James Jul 20 '14 at 03:40
@dialer repeat after me. Windows is not a real time OS. – Aron Jul 20 '14 at 05:02

Limits of (soft real-time) timing requirements in Windows OS

2 Answers2