I have a case where I need to ensure that an ISR cannot interact with a library when it is unloaded in an embedded system (ARM Cortex-M4 based).
The library can be loaded or deloaded at any time, as well as the interrupt / ISR firing off at any time. These library load / deload / process ISR calls are wrapped inside project specific routines, so I have full control over what is executed at the boundary.
For this, it seems like the natural way to protect against this is to add in a "is loaded" flag (and operate on using acquire / release orderings), and only call the process ISR library routine when I can be sure that the library is currently loaded.
However, the deload wrapper routine presents a problem - the release memory ordering (my initial thought) does not cause the "is loaded" flag to be seen before the library is actually deloaded, as according to the C++ release semantics, this only provides guarantees for no loads/stores before the atomic operation to be reordered, not load/stores after. I really want a "store with acquire ordering" - the flag must be cleared and visible to the ISR before the library is deloaded (no load/stores can be reordered to before the operation) - but the C++ standard seems to imply this isn't possible at least with normal acquire / release.
I have explored the idea of using SeqCst operations / fences due to additional total ordering property, but I am not confident that I am using it correctly. Inspecting the assembly of the fence version, it appears correct with DMB
appearing in the right places, however reading the standard about atomic_thread_fence
and Fence-fence synchronization, it seems like this is technically incorrect usage as "FA is not sequenced before X in thread A".
Is someone able to walk through the process of figuring out whether this usage is correct or not (using C++ standard terms, ie: https://en.cppreference.com/w/cpp/atomic/memory_order, https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence)?
Equivilant code link: https://godbolt.org/z/v9nzqvjxa - this uses SeqCst fences.
The internals of the library code are irrelevant and can be treated as an opaque box.
Edit:
Copy of equivilant code link above:
#include <atomic>
#include <thread>
#include <iostream>
#include <stdexcept>
#include <chrono>
#include <random>
/// Library code
bool non_atomic_state;
void load_library()
{
if (!non_atomic_state)
std::cout << "loaded" << std::endl;
non_atomic_state = true;
}
void deload_library()
{
if (non_atomic_state)
std::cout << "deloaded library" << std::endl;
non_atomic_state = false;
}
void library_isr_routine()
{
if (!non_atomic_state)
throw std::runtime_error("crash");
std::cout << "library routine" << std::endl;
}
/// MCU project code
std::atomic<bool> loaded;
void mcu_library_isr()
{
if (loaded.load(std::memory_order_relaxed))
{
std::atomic_thread_fence(std::memory_order_seq_cst);
library_isr_routine();
}
}
void mcu_load_library()
{
load_library();
std::atomic_thread_fence(std::memory_order_seq_cst);
loaded.store(true, std::memory_order_relaxed);
}
void mcu_deload_library()
{
loaded.store(false, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
deload_library();
}
/// Test harness code
void t1()
{
std::random_device rd;
for (int i = 0; i < 10000; i++)
{
auto sleep_duration = std::chrono::milliseconds(rd() % 10);
std::this_thread::sleep_for(sleep_duration);
mcu_library_isr();
}
}
void t2()
{
std::random_device rd;
for (int i = 0; i < 10000; i++)
{
auto random_value = rd();
auto sleep_duration = std::chrono::milliseconds(random_value % 10);
std::this_thread::sleep_for(sleep_duration);
if (random_value % 2 == 0)
mcu_load_library();
else
mcu_deload_library();
}
}
int main()
{
std::thread t1_handle(t1);
std::thread t2_handle(t2);
t1_handle.join();
t2_handle.join();
return 0;
}