Your "rule" as you call it is a bit detailed and specific. But the "rule" applies to pretty much all processors that use one stack for everything.
You should as a general rule move your stack pointer first to "allocate" the stack space you want, which is how you keep the next thing from trashing it. And then move it back to de-allocate.
In the case of ARM, likely the code you were linking. You have banked registers in the arm, one of the first chapters in the architectural reference manual (need to read that before analyzing or writing assembly language, esp the one picture on registers). The Exception modes as the picture describes all have their own stack pointer. So when for example an interrupt occurs, some other stack pointer is used to save state, so your data wont be clobbered.
User and System share a stack pointer, but that is so that kernel, etc code can have access without getting trapped in user mode. System is not used for exceptions so your code isnt going to just stop and switch states and clobber the stack.
Now ARM is like any other brand Ford for example. They make big trucks little trucks, SUV's, small cars, grandpa cars, etc. ARM has a wide array of processor cores. The cortex-m is suited for microcontrollers and other small tight spaces. It has one stack, when an exception occurs it saves state on the stack for you, clobbering your data. So the code you pointed out would be bad, granted why would you be using printf on a cortex-m?
Compilers can be configured to use or not use a second stack pointer, the x86 world is used to this idea (sp and bsp), but it is not required. For a (data) stack to be useful there needs to be a stack pointer and instructions for referencing into the used part of the stack, stack pointer relative addressing. On some platforms you can access the stack pointer and use another register (make a copy) to access the stack frame leaving the stack pointer free to roam about. With or without it is an incredibly bad idea to touch a stack pointer in inline assembly in general, you need to know your toolchain well and code like that would require constant maintenance, every new release of the compiler or every new system you compile that code on, you have to hand examine the produced output to insure your manipulation is safe. If you are going to that level why use inline asm and burn all those man hours (job security?) you would use asm and make something safe and reliable the first time. If you just want some more data for that function, just make a local variable, it changes the subtraction on the sp, done. No inline assembly required. If you have this desire to look past the end of the stack, use assembly not inline assembly. If you want to modify past the stack pointer or quickly allocate for some reason without using local variables then again use assembly and move the stack pointer on systems where you have to to avoid corruption of this data you are playing with.
Other than crashing the system it doesnt make much sense to mess with the stack pointer in inline assembly. Has nothing to do with arm or x86 or fill in the blank.
What they have done there is write the entire function in assembly using inline assembly. And that may just a case of their build system choices, you can feed assembly into the gnu C compiler (if using inline assembly you have to write compiler specific code anyway so you already know what compiler you are using) and produce an object just like you can with C. There are other ways they could have done that is the point that are not as ugly. Unfortunately it is not an uncommon sight to see that solution. If running on a not-cortex-m, that code is safe-ish as is, you cant add a function call in the middle of it as you will trash your data, they do move the stack pointer just before the call rather than up front like a normal solution. Would have to track down the author to ask the "why did they do that" question.