2

How I would do this without changing any other register (aka keep ecx & edx the same as before)?

In C++, it would be this:

int ecx = 3;
int edx = 1;
int ebx = ecx - edx;

So far, I've done this:

mov ecx, 1
mov edx, 3
sub ecx, edx
mov ebx, ecx
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • What have you tried doing? – UnholySheep Jul 04 '18 at 21:25
  • Just subtracting edx from ecx then making ebx equal to ecx but then I would have to change the value of ecx back – Panda Assassin Jul 04 '18 at 21:27
  • Have you considered first copying the value of `ecx` into `ebx` before doing the `sub` directly on `ebx`? (That's actually what most C++ compilers would do as well) – UnholySheep Jul 04 '18 at 21:30
  • There is no "`lea` but subtract instead of add" if that's what you mean, you need other tricks – harold Jul 04 '18 at 21:34
  • Yeah, but I was just wondering if there was a way to do it without doing that – Panda Assassin Jul 04 '18 at 21:34
  • 3
    `mov ebx,ecx` before `sub ebx,edx` is most optimal. You can achieve the same by many other ways, like for example `neg edx` `lea ebx,[ecx+edx]` `neg edx`, or using `push/pop` like in the answer, but all of those variants are artificially complex to avoid the most straightforward one and come with extra performance penalty. – Ped7g Jul 04 '18 at 22:02

2 Answers2

4

With x86-style 2 operand instructions that destroy their destination, you can always simulate a non-destructive 3-operand instruction with mov to copy one operand to the destination, then run the destructive instruction on that destination.

# with ecx and edx holding your inputs (which I'm calling C and D).

mov  ebx, ecx      ; ebx = C
sub  ebx, edx      ; ebx = C - D

That's the best you can do for this case, where you need to not destroy the values in ECX and EDX.

If you're running low on available registers, saving ECX on the stack and then producing the C - D result in ECX instead of a new register can be a good option.

Often you can keep using the same register for the same variable throughout a function, but this is not required, and sometimes not optimal. Use comments to keep track of things.

Compilers are usually pretty good at register allocation, but their code can be hard to read because they don't even try to be consistent with register use. For non-destructive operations they'll often put the result in a new register for no reason. Still, compiler output is often a good starting point for optimization. (Write a tiny function that does something, and see how it compiles. Or write your whole thing in C with function args instead of constants as inputs, and compile it.)


x86 has some copy-and-operate instruction for other operations (not sub), most notably LEA.

lea  ebx, [ecx + ecx*4]     ; ebx = C * 5
lea  ebx, [ecx + ebx - 2]   ; ebx = C + D - 2

x86 addressing modes can add or subtract constants, but can only left-shift and add registers.


The immediate-operand form of imul is also 3-operand, for use with multipliers that you can't do with 1 or 2 LEAs:

imul   ebx,  ecx,  0x01010101     ;  ebx = cl repeated 4 times, if upper bytes were zero

Unlike most immediate-operand instructions, imul doesn't overload the /r field in the ModRM byte as extra opcode bits. So it has room to encode a register destination and a reg/mem source, because 186 dedicated a whole opcode byte to it.


ISA extensions like BMI1 and BMI2 have added some new 3-operand integer instructions, like ANDN and SHRX.

andn   ebx,  ecx, edx             ; ebx = (~C) & D   ; BMI1

shrx   ebx,  edx, ecx             ; ebx = D >> C     ; BMI2

But they're not universally available, only Haswell and later, and Ryzen. (And the Pentium/Celeron versions of Haswell/Skylake are still sold without them, further delaying the point at which they become baseline. Thanks, Intel.)

And of course for vector instructions, AVX provides non-destructive versions of all the SSE instructions.

movaps    xmm2, xmm0         ; copy a whole register
subsd     xmm2, xmm1         ; scalar double-precision FP subtract: xmm0-xmm1

vsubsd    xmm3, xmm0, xmm1

or a less obvious use-case

xorps     xmm0, xmm0    ; zero the register and break any false dependencies
cvtsi2sd  xmm0, eax     ; convert to double-precision FP, with the upper element = 0

xorps     xmm1, xmm1
cvtsi2sd  xmm1, edx

vs. AVX:

vxorps    xmm1,  xmm1,xmm1   ; xmm1 = all-zero

vcvtsi2sd  xmm0, xmm1, eax
vcvtsi2sd  xmm1, xmm1, edx

This reuses the same zeroed reg as a merge destination to avoid false dependencies (and have the upper 64 bits zero, of the 128-bit register).

ecm
  • 2,583
  • 4
  • 21
  • 29
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • *you can always ...* - That was a slight overstatement. You can't do `x = y - x` with MOV + SUB, although you can do `neg eax` / `add eax, edx`. Still 2 instructions, but neither of them is `mov` so mov-elimination isn't keeping the latency down to 1 cycle. (Like it would be on a 3-operand machine like AArch64 with `sub w0, w1, w0`.) – Peter Cordes Jun 21 '21 at 00:28
0

You can always use the stack to preserve registers:

    push ecx
    push edx
    mov ecx, 1
    mov edx, 3
    sub ecx, edx
    mov ebx, ecx
    pop edx
    pop ecx
OregonJim
  • 326
  • 2
  • 10
  • This gives me a random number for ecx & edx – Panda Assassin Jul 04 '18 at 21:49
  • @PandaAssassin : Your question says you want to keep ECX and EDX same as beofre. ECX in this code will be whatever was in ECX before the `push ecx`. I think your question may have been wrong? – Michael Petch Jul 04 '18 at 21:52
  • @MichaelPetch I get it now. I was just trying to retain `ecx` as 3. For me, I would just put `push` after I `mov` `ecx` and `edx` – Panda Assassin Jul 04 '18 at 21:57
  • 2
    This is correct but way over-complicated. I'm tempted to downvote this because cluttering your code with a boatload of push/pop makes it hard to read. – Peter Cordes Jul 04 '18 at 23:07
  • @PeterCordes, two is hardly a boatload. Hard to read? Have you never seen a typical interrupt routine? – OregonJim Jul 04 '18 at 23:36
  • 1
    I meant to say that encouraging people to use push/pop when it's not necessary leads to unreadable beginner code that saves all the registers around every function, or even inside single functions. I'm not saying that 2 is a boatload. BTW, the OP's C++ source leaves EDX and ECX modified, so if you have to `mov reg,imm` inside the push/pop, you might as well just optimize it to `mov ebx, 1-3`. (And for the record, I wasn't the downvoter on this answer. Someone else downvoted right after I said I was *tempted* to. You probably thought that was me and were extra annoyed.) – Peter Cordes Jul 05 '18 at 00:31