3

I am working on bare metal programming in rust on a Raspberry Pi 3 running in 64 bit mode. I have implemented a spinlock as follows:

use core::{sync::atomic::{AtomicBool, Ordering}, cell::UnsafeCell, ops::{Deref, DerefMut}};

pub struct SpinMutex<T> {
    lock: AtomicBool,
    data: UnsafeCell<T>
}

impl<T> SpinMutex<T> {
    #[allow(dead_code)]
    pub const fn new(data: T) -> Self {
        Self {
            lock: AtomicBool::new(false),
            data: UnsafeCell::new(data)
        }
    }

    pub fn lock(&self) -> SpinMutexGuard<T> {
        while self.lock.compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed).is_err() {}

        SpinMutexGuard {
            lock: &self.lock,
            data: unsafe { &mut *self.data.get() }
        }
    }
}

unsafe impl<T> Sync for SpinMutex<T> {}

pub struct SpinMutexGuard<'a, T> {
    lock: &'a AtomicBool,
    data: &'a mut T
}

impl<'a, T> Deref for SpinMutexGuard<'a, T> {
    type Target = T;
    fn deref(&self) -> &T {
        self.data
    }
}

impl<'a, T> DerefMut for SpinMutexGuard<'a, T> {
    fn deref_mut(&mut self) -> &mut T {
        self.data
    }
}

impl<'a, T> Drop for SpinMutexGuard<'a, T> {
    /// The dropping of the MutexGuard will release the lock it was created from.
    fn drop(&mut self) {
        self.lock.store(false, Ordering::Release);
    }
}

#[cfg(test)]
mod tests {
    use super::{SpinMutex};

    #[test]
    fn test_spin_mutex() {
        let state = SpinMutex::new(0);

        assert_eq!(*state.lock().data, 0);

        *state.lock().data = 9;

        assert_eq!(*state.lock().data, 9);
    }
}

When I run the tests on my local machine (64 bit windows) the locks works. However, on the Raspberry Pi the lock method gets stuck in an infinite loop and never returns. Is there a reason why this happens?

Here is how Rust compiles compare_exchange_weak with inlining disabled:

   80ba8:   9100400a    add x10, x0, #0x10
   80bac:   085f7d48    ldxrb   w8, [x10]
   80bb0:   34000068    cbz w8, 80bbc
   80bb4:   d5033f5f    clrex
   80bb8:   14000004    b   80bc8
   80bbc:   52800029    mov w9, #0x1                    // #1
   80bc0:   080b7d49    stxrb   w11, w9, [x10]
   80bc4:   3400004b    cbz w11, 80bcc
   80bc8:   2a1f03e9    mov w9, wzr
   80bcc:   7100011f    cmp w8, #0x0
   80bd0:   52000120    eor w0, w9, #0x1
   80bd4:   1a9f07e1    cset    w1, ne  /
Błażej Michalik
  • 4,474
  • 40
  • 55
Someone
  • 800
  • 1
  • 7
  • 25
  • 1
    What is the purpose of your spin lock, this is a lot of work. Have you thought of using something else or an approach where you don't have to worry about spin locks? I know it's not the question you're asking, but if I had to worry about spin locks I'd never get anything done. – Dan Chase Aug 14 '21 at 18:33
  • Does changing all the orderings to `Ordering::SeqCst` accomplish anything? – Aplet123 Aug 14 '21 at 18:39
  • 1
    My old Raspberry Pi only has one core. Probably more modern ones are multicore, but you are bare metal, so maybe you are not doing proper multitasking? Because I think that a spinlock with just one core makes no sense. – rodrigo Aug 14 '21 at 19:07
  • Do you have enough debugging infrastructure in place to be able to cause a trap while your lock is stuck, and inspect the state of the machine? Also, I presume that you're already able to successfully run other Rust code on the bare metal, so that it's not a generic setup problem? – Nate Eldredge Aug 14 '21 at 19:16
  • You said you were targeting `bare metal`, does that mean your RPi does not have a kernel which preempts your currently running thread? If so, the `compare_exchange` will spin forever because no other thread can run in case the lock is held by another thread. You'll need to put a call to `std::thread::yield_now()` into the loop to fix that. Also, unrelated to your problem, you might want to use `compare_exchange_weak` because you are no evaluating the original value in case `compare_exchange` fails. – user2722968 Aug 14 '21 at 22:30
  • 1
    But the test should still work, because here the lock is just locked and unlocked repeatedly in that order; there is never contention. – Nate Eldredge Aug 14 '21 at 22:33
  • @DanChase I want to have a spin lock to be able to safely use mutable statics from multiple threads. – Someone Aug 14 '21 at 23:30
  • @Aplet123 that doesn't help – Someone Aug 14 '21 at 23:30
  • 1
    @rodrigo: The RPi 3B+ has 4 cores. – Nate Eldredge Aug 14 '21 at 23:36
  • 1
    But you are using bare-metal so no OS, are you sure you are actually using all the cores? – rodrigo Aug 15 '21 at 01:13
  • @NateEldredge It seems that the problem is that the `compare_exchange_weak` method does not return. I'll try to get the disassembly of it – Someone Aug 15 '21 at 02:01
  • @NateEldredge I've added the compiled `compare_exchange_weak` function. – Someone Aug 15 '21 at 02:29
  • @rodrigo I am currently using a single core, however I want to be able to use multiple cores in the future – Someone Aug 15 '21 at 20:58
  • when you try a simple ldr/str exclusive test what do you see on this platform? (getting rid of all the other code) – old_timer Aug 16 '21 at 11:07
  • isolate the problem by dividing it in half or fractions and then put it back together when the parts work. – old_timer Aug 16 '21 at 11:09
  • 2
    You are bare metal and haven't enabled caches nor MMU? So `ldxrb` will always fail as the exclusive lock can never be taken. – artless noise Aug 17 '21 at 19:06
  • @artlessnoise ok. That makes sense. Time to enable the cache and MMU! – Someone Aug 23 '21 at 19:05
  • If you are scheduling, you also need to implement a `clrex` during a context switch so that one 'task/thread' does not confuse having a lock from another 'task/thread' that happens during a context switch. It would be the same if you test with co-routines... but I don't think [tag:rust] has them. – artless noise Aug 23 '21 at 21:00
  • Did caches and MMU help? – artless noise Dec 06 '21 at 19:31

1 Answers1

1

Here's a non-exhaustive list of conditions that I'm certain (I've tested them both ways) need to occur in order for atomics to work on RPi 4 aarch64 in EL1 (ARMv8). This will probably be very similar to RPi 3 (ARMv7).

  • MMU must be enabled (SCTLR_EL1 bit [0] set to 0b1)
  • Data caching must be enabled (SCTLR_EL1 bit [2] set to 0b1)
  • The page on which the lock resides has to be marked as normal, cachable memory via MAIR (I've used 0xff - I'm not sure which bits are redundant wrt. atomics, but I don't think there's much reason to use anything else for normal memory).
Błażej Michalik
  • 4,474
  • 40
  • 55