No, MOVZX is zero extension, not sign. And CWD sign-extends AX into DX:AX (like you want before IDIV, not DIV).
movSx eax, word [wNum2]
is a more efficient way to do mov ax,mem
+ CWDE, not CWD. (If your inputs are known to be non-negative when treated as signed, sign and zero extension do the same thing).
What does cltq do in assembly? has a table of cbw/cwde/cdqe and the equivalent movsx instruction, and what cwd/cdq/cqo do (and the equivalent mov/sar).
None of these things are what you want before unsigned div
: use xor edx,edx
to zero DX, the high-half input for 32/16 => 16-bit division.
See also When and why do we sign extend and use cdq with mul/div?
To avoid false dependencies from writing partial registers, on most recent CPUs the most efficient thing would be to do a movzx load just to get your 16-bit value into AX without merging into the previous value of RAX/EAX. Similarly, xor-zeroing isn't (usually?) recognized as a zeroing idiom on partial registers so you want 32-bit operand-size even if you're only going to read the low half of
movzx eax, word [wNum2] ; zere extend only to avoid false dep from merging into EAX
xor edx, edx ; high half dividend = DX = 0
div word [wNum3]
mov [wAns16], dx ; store remainder from DX, not EDX
Your code storing 32-bit EDX into [wAns16]
is presumably a bug, assuming there's only 2 bytes of space there before you step on whatever comes after it.