4

Here is the story.

Its a safety critical project and needs to run a time critical functional routine in 20KHz. Now the design is to put functional routine in a 20KHz FIQ interrupt, meanwhile safety interrupt also in FIQ. Thats the only two FIQ in system. (Surely there are couples of IRQ enabled in the MCU)

I know that its not good to put task context in interrupt ISR, the proper way of doing this to set mark and run in OS task. But seems current design harm nobody.

The routine takes about 10us (main clock 300MHz), so basically it will not blocks IRQ/FIQ for unacceptable time. It even save time for extra context switch compare with using OS task to run the functional routine. To me, currently it feels like the design is against every principle written on text book in university but can not find a reason to say no to it.

How could I convince myself to move functional routine from ISR to OS? Should I?

Tian
  • 61
  • 9
  • 1
    If the FIQ/s is/are short enough to not impact the required functionality of the rest of the app, and do not use code/functions that are illegal/unwise in FIQ context, then fine. Sure, you have to be careful what you do/call, but if it is safe and works, great:) – Martin James Nov 10 '17 at 06:52
  • Example - I came unstuck when I pushed a buffer struct pointer onto a circular queue that, while safe when pushed from IRQ and pulled from thread, was not safe when it was also pushed from an FIQ as the queue indices could get mixed up when the FIQ interrupted an IRQ:( I got round that by explicitly setting a software interrupt so its hanlder ran either after the FIQ returned or after any other IRQ that had been interrupted had returned. – Martin James Nov 10 '17 at 07:01
  • I have to ask, however, what it is that you are doing that takes 10us? – Martin James Nov 10 '17 at 07:03
  • @MartinJames Good point. Interrupt preemptive truly cause problem while critical resource not well protected. The 10 us is just some math work. Sensors getting data back and MCU needs do some calculation and response real fast. The math works is the functional routine mentioned above. I figure there are no critical resource in this case. If I move to OS, I get extra bloat. What's the benefit I may gain from those bloat? – Tian Nov 10 '17 at 07:15
  • 1
    You've labelled this safety-critical. If your application is safety-critical, then you must ensure that the interrupt is 100% cyclic and that your program is 100% deterministic. It might even be better to get rid of the interrupt, have the main program do all its work, then busy-wait for the interrupt source - this assuming that the program does nothing but to process all the things needed between the 20kHz cycles. It is kind of hard to say how you should design the system without knowing the spec. – Lundin Nov 10 '17 at 07:51
  • @Lundin The 20K is triggered by timer so its cyclic and deterministic. I was told to use minimum interrupt if possible in safety critical system. However, I don't see any benefit of not using this interrupt, nor any harm using this. Code is predictable with interrupt, And the function routine truly is the most important one in the application.. Sorry I can't talk much about the spec... Main function is provide by the FIQ function routine, all the rest are just diagnostic and communication related. (Not heavy communication, just few signals...) – Tian Nov 10 '17 at 09:07
  • @Tian stick with the FIQ – Martin James Nov 10 '17 at 09:35
  • Tian, if it is a timer that triggers the interrupt (and, of course, you have verified that trigger period is much higher than 10 us), the only concerns regarding safety are that this interrupt suppresses something the system must handle in the mean time, and that the HW timer that provides the triggers has a malfunction that breaks your system by launching the ISR every, say, 9 us. This must be taken into account in some of your FME(D)As - then it is a useful way to go. The conservative way proposed by Lundin is as good, though. – HelpingHand Apr 27 '20 at 20:59

3 Answers3

3

Let's recollect your situation:

  1. you are coding a safety critical system
  2. the software architecture isn't specified otherwise you wouldn't ask the question at hand
  3. the system requirements weren't processed correctly otherwise 2) wouldn't be in question
  4. someone told you to "use minimum interrupt if possible in safety critical system"
  5. you want to use the highest priority & non-interruptible code for "just some math work"

Sorry for being a bit harsh but I wouldn't want to use/be in your safety critical system.

For your actual problem: you have to make sure two things

  • the code in the FIQ must be deterministic and WCET tested
  • the registers of the timer must be protected and supervised. Why? An unwanted/erroneous manipulation of the timers registers by a lower safety level code can congest the CPU so much that effectively nothing else but the interrupt is processed.

All this under the assumption that your safe state depends entirely on an external hardware watchdog.

PS: Which are the hazards for users of your system? Annoyance? Injury? Lethal? Are you in a SIL or ASIL context?

Vroomfondel
  • 2,704
  • 1
  • 15
  • 29
  • Indeed, lack of detailed requirements and software architecture spec is truly painful and causing this question. The project is in an ASIL D context.. sorry to tell that.. lucky its more like a educational purpose than industrial.. – Tian Nov 13 '17 at 02:48
  • 1
    ASIL D. You know that the only correct way to react on your side would be to dump all tasks and require an immediate roll up of the whole project? As it sounds, there is no safety manager and no process manager, so by definition it is no ASIL D, rather a test project aiming to deliver possibly lethal functionality. Please don't take that as an insult, it is just speaking out the truth - no accusations from my side. – Vroomfondel Nov 13 '17 at 10:27
  • I won't take it as insult, and truly appreciated for your straightforward. It's great to have someone point out what's left behind so can make sure all things packed. A question here is internal/external watchdog can make sure CPU is not fully occupied by interrupt. At least can reset the MCU once shit happened. Then can I say that CPU congestion is not a big issue at this point? (On the other hand, seems reduced timer ISR rate is not supervise-able by watchdog...The monitor is indeed needed) – Tian Nov 13 '17 at 12:21
  • @Vroomfondel - Some of your args are really helpful, but IMO, the drastic absoluteness in which you articulate the answer is unjustified. If the watchdog concept is sound (which also implies that watchdogs guarantee the safe state!), the system is safe with respect to timer errors. Period. This doesnt mean the result is of much use. Addressing another point - there may be phases in safety projects (with requirements and safety manager etc.) where developers are experimenting to find a feasible proposition that can be re-entered into arch/design, even before all requirements have been finished. – HelpingHand Apr 27 '20 at 21:09
  • @Vroomfondel - Could it be that it was you who had written a response until some hours ago? I remember I had a notification when I checked the board during a short break, but I couldn't write (to you (?)) a qualified answer in limited time, now the comment seems to be gone. I don't want to just quit this discussion if you disagree my comment... – HelpingHand Apr 30 '20 at 16:51
  • 1
    @HelpingHand yes, I wrote a comment but it seems gone now. *shrug* – Vroomfondel May 01 '20 at 12:11
1

The reason to move complex code away from ISR is precisely to avoid lengthy processing in the ISR and thus timing jitter and delayed interrupt servicing resulting from it.

You are stating the your processing is not lengthy so do it in the ISR! Otherwise you are just adding bloat.

teroi
  • 1,087
  • 10
  • 19
  • 1
    It's kinda hard for me to determine whether 10us in ISR is lengthy as its running in a 20KHz interrupt! Which means only the functional routine in FIQ ISR will close interrupt for about 1/5 of overall running time! I imagine there is a mathematical way to determine whether its acceptable or not... – Tian Nov 10 '17 at 07:06
  • 2
    Can the implementation service the ISR and process each result in time? (does it have to?) If it can then you are good to go. If you separate the ISR and the processing, you will add more code to execute. – teroi Nov 10 '17 at 07:27
  • 1
    For now, it work just fine. But I'm currently in an early stage of the development so not quite sure the hidden issues in the future. – Tian Nov 10 '17 at 07:40
  • 2
    @Tian 10us out of 50us is one fifth of all available processing power. That's quite a lot of interrupt latency. The question is, does the CPU have anything more important to do than this task, or would a 10us delay screw up timing elsewhere? If so, you need to minimize the ISR. If not, then all is well. – Lundin Nov 10 '17 at 07:47
  • 2
    In any case, if you're doing a time-critical system, and you need to spend 20 % CPU time for just periodic tasks interrupts, then you probably shouldn't be doing much anything else... – Antti Haapala -- Слава Україні Nov 10 '17 at 08:19
  • @Lundin For now the system is not full loaded so not sure other tasks will screw up... Other timing is not critical. One of the concern I have is if running under OS task, there may have plenty of statistic/protect tools available... Otherwise I don't see any strength push me doing this...It makes me feel sick that text books and principles says "no task context in interrupt", "no interrupt in safety critical"... while I don't see any significant draw back of doing this.. – Tian Nov 10 '17 at 09:14
  • 1
    @AnttiHaapala Are you saying that its common to put period task in interrupt to speed up performance in practices? – Tian Nov 10 '17 at 09:17
  • 1
    I'm not Antti but yes. If you have a hard real time requirement and your processing is fast enough then in fact it is more easy to verify that you meet your task deadline. There are even systems without RTOS. Then this is the precise way to do it. – teroi Nov 10 '17 at 09:59
  • 1
    @Tian all sorts of stuff is written in books and taught in schools. Some of it is even correct. I know that when the fight is between 'you should do this', 'you must not do that', 'it's an anti-pattern' etc. and a requirement spec, the spec always wins:) – Martin James Nov 10 '17 at 09:59
  • @teroi yea.. I guess I will stick to the FIQ. Thanks for your response! – Tian Nov 10 '17 at 10:15
  • @Tian - It is good that you worry about this particular aspect (and even feel sick some times), but this doesn't mean you must never do this. Your feeling is fully correct, you are making a thick **compromise** because you have some good reason for it, e.g., increasing latency to increase performance (because you can afford this!). Don't describe this type of architecture in any Best Practice document you'll write in 5 years, but remember the reasons that forced you to take this ugly route. – HelpingHand Apr 27 '20 at 21:16
1

20Khz = 50us between interrupts, with 10us of processing time it gives you roughly 20% of CPU time just for this "task", and a jitter of 10us in any other routine that runs in your CPU, it will also sum 10us of processing time for each 40us that any other task will consum, if it is ok for your project, and you keep your total CPU processing time below 70% (which is the common maximum acceptable for critical systems), IMHO it should work without any issue.

Gustavo Laureano
  • 556
  • 2
  • 10