I'm not checking correctness of your code on purpose:
A spinlock is already a "memory fence" (it just does partial memory flush, so it is not really a memory fence), it synchronize already reads and writes (otherwise it could not work), so if the spinlock is correct and working you will never need an additional memory fence (wich would just be a useless penality).
That's a conceptual issue, you should know details about your architecture when implementing such stuff, especially the "memory contract" of single assembly instructions.
Memory fences have other purposes (like allowing objects in C++ to become fully initialized before starting using them in asynchronous code)