What is the most efficient way to implement
d=u / v
r=u mod v
For the ARM V7M instruction set where u is unsigned 64 bit, and v is unsigned 32 bit?
I'm particularly interested in the special case that v is "normalized" so that its high bit is set.
I've seen various options in Knuth "The Art of Computer Programming (Vol 2)", but am having difficulty seeing the best way to implement this using the available V7M instructions UMULL etc.