I am working on a Forth implementation where I have come across the need for double-cell arithmetic (the Forth implementation is 32-bit) including double-cell multiplication and division/remainder. However, the architecture I am developing this for, ARM Cortex-M4, lacks 64x64 multiplication or 64/64 division/remainder instructions (it only has 32x32 multiplication and 32/32 division and 32x32+64 multiply/accumulate instructions).
While I would be fine with 32x64 multiplication (as 64x64 multiplication can be emulated with it for cases that would not overflow anyways), and for some things 64/32 division/remainder would be sufficient, I would like to at least have a full 64/64 division/remainder in addition to 32x64 multiplication so I can do a full implementation of double-cell arithmetic.