compute -2xy - 3z with MSVC inline-assembly

Question

Can u guys tell me where am i mistaken?

I need to compute this in inline-assembly -2xy - 3z.

int solution(int x, int y, int z)
{ 
    x=4;
    y=5;
    z=2;
    int result;
    __asm
    {   
       mov eax, -2 
       imul [x]
       imul [y]
       mov ebx, eax 
       mov eax, -3
       imul [z]
       sub eax, ebx 


        mov [result], eax       ; 
    }
       assert(result == -2*x*y – 3*z);
       printf("solution_for_grade_6(%d, %d,**strong text** %d) = %d\n", x, y, z, result);
 return result;

What processor are you writing the asm for? Also, what kind of error do you get when you run the code? Some more information will help others to get to the answer faster. — dey.shin, Oct 12 '17 at 17:48
Its for IA-32. I don't get an error, it runs but nothing happens on the window. — konstantin97, Oct 12 '17 at 17:52
The only error I see is that `sub eax, ebx` should be `add eax, ebx` since you multiplied by `-3` and did the negation already. You could multiple by `3` instead of `-3` and keep the `sub eax, ebx` as is. — Michael Petch, Oct 12 '17 at 18:30
Maybe what is happening is that you are running the program from the MSVC GUI and the program opens a console, writes the output and then closes the window before you can see it. Either step through with the debugger and watch the output window or you could include `#include` and then before your program exits do a `system("pause");` so your program pauses for a keystroke before it exits. — Michael Petch, Oct 12 '17 at 18:32
Your code is inefficient, using only the one-operand form of imul. This should be pretty efficient: `mov eax, [x]` / `neg eax` / `imul eax, [y]` / `mov ecx, [z]` / `imul ecx, [y], -3` / `lea eax, [ecx + eax*2]`. (You could replace the imul by `-3` with LEA, but it can't negate, and it can't use a memory source.) Of course if you cared about efficiency, your first mistake would be using inline-asm in the first place instead of letting the C++ compiler do that for you. (See also https://stackoverflow.com/tags/inline-assembly/info) — Peter Cordes, Oct 12 '17 at 20:16

Paolo Fassin · Answer 1 · 2017-10-17T17:05:59.590

1

This is my fifth solution (return a 32 Bit value; I've removed fourth and third mistaken/UN-optimized solutions); it uses only one IMUL; note that both abs(X) && abs(Y) they couldn't be > (1<<14)-1, abs(Z) must be < (1<<29):

__asm
{
 /*  INPUT: EAX= X               */
 /*         EDX= Y               */
 /*         EBX= Z               */
 /*   TEMP: EDX                  */
 /* OUTPUT: EAX= -2*xy-3*z)      */

 NEG   EAX

 IMUL  EDX            /* EDX:EAX=-x*y */

 ADD   EAX,EAX        /* EAX=-2*x*y */

 LEA   EDX,[EBX+EBX]
 ADD   EDX,EBX
 SUB   EAX,EDX       /* EAX=-2*x*y-z*3 */

 MOV   [RESULT],EAX
}

This is my second solution (return a 64 Bit value); it uses only one IMUL; now is optimized. Note that both abs(X) && abs(Y) they couldn't be > (1<<30)-1. I've supposed that RESULT, in this case, is a 64 Bit integer:

__asm
{
 /*  INPUT: EAX= X               */
 /*         EDX= Y               */
 /*         EBX= Z               */
 /*   TEMP: ESI, EDI             */
 /* OUTPUT: EDX:EAX= -(2*xy+3*z) */

 IMUL  EDX

 ADD   EAX,EAX
 ADC   EDX,EDX /* EDX:EAX=2*xy */

 MOV   ESI,EBX
 ADD   ESI,ESI
 SBB   EDI,EDI /* EDI:ESI=2*z */

 ADD   ESI,EBX
 ADC   EDI,EDI /* EDI:ESI=3*z */

 ADD   EAX,ESI
 ADC   EDX,EDI /* EDX:EAX=2*xy+z*3 */

 NOT   EAX
 NOT   EDX

 ADD   EAX,1
 ADC   EDX,0   /* EDX:EAX=-(2*xy+z*3) */

 LEA   ESI,[RESULT]
 MOV   [ESI],EAX
 MOV   [ESI+4],EDX
}

But remember that each of two product has a result of more then 64 Bit; in assembly (return a 96 Bit value):

 ; -2xy-3z=-(2*xy+3*z)

 ; INPUT: EAX= X
 ;        EDX= Y
 ;        ECX= Z

 ; TEMP: EDI, ESI, EBP

 ; OUTPUT: EBX:EDX:EAX= -(2*xy+3*z)

 IMUL  EDX

 ADD   EAX,EAX
 RCL   EDX,1
 SBB   EBX,EBX ;EBX:EDX:EAX=2*xy

 MOV   ESI,ECX
 ADD   ESI,ESI
 SBB   EDI,EDI 
 MOV   EBP,EDI ;EBP:EDI:ESI=z*2

 ADD   ESI,ECX
 ADC   EDI,EDI
 ADC   EBP,EBP ;EBP:EDI:ESI=z*3

 ADD   EAX,ESI
 ADC   EDX,EDI
 ADC   EBX,EBP ;EBX:EDX:EAX=2*xy+z*3

 NOT   EAX
 NOT   EDX
 NOT   EBX

 ADD   EAX,1
 ADC   EDX,0
 ADC   EBX,0   ;EBX:EDX:EAX=-(2*xy+z*3)

edited Oct 17 '17 at 17:05

answered Oct 16 '17 at 19:42

Paolo Fassin

361
2
11

However, you have to save / restore the EBX, ESI, EDI and EBP registers before / after this code if you use it in a standalone function ... – Paolo Fassin Oct 16 '17 at 19:49
1

Don't use push/pop to save/restore inside inline-asm statements. Let the compiler do that for you if it needs to. (It might not after inlining into a function which already saves/restores those regs, but doesn't have anything valuable in them when this code runs.) – Peter Cordes Oct 17 '17 at 03:29
`add edi,edi` is slightly more efficient than `shl edi,1`, and sets CF exactly the same. – Peter Cordes Oct 17 '17 at 03:30
Compilers make somewhat better code for the 32-bit result case. clang5.0 does a particularly good job: https://godbolt.org/g/P8xxgW. You forgot to use `lea` for z*3. That block that starts with `mov` should just be one instruction. Also, the 2-operand form of `imul` is more efficient, so use it when you don't need the high half. And you can use either form with a memory operand. – Peter Cordes Oct 17 '17 at 10:40
Your 64-bit output version is actually pretty good, better than what gcc does (https://godbolt.org/g/ueQuAJ). (But clang5.0 uses a `imul` to multiply by `-3`, and is probably faster than yours on modern CPUs where `imul` is efficient.) `rcl edx,1` should be replaced with `adc edx,edx`. `rcl r,1` is 3 uops on Skylake, but `adc` is only 1. (They're equal on AMD). Nice idea with `sbb edi,edi` to sign-extend the upper half, although it has a false dependency on the old value of `edi` on everything except AMD Bulldozer-family. – Peter Cordes Oct 17 '17 at 12:57
Nice try, but x86 addressing modes can only add, not subtract. `[eax-ebx]` won't assemble. Also, `lea eax, [eax+eax]` has no advantage over `add eax,eax`. It runs on fewer ports and is larger. Only use LEA if it saves at least a `mov`. Look at the compiler output from my last comment for how to use `lea` to multiply by 3 before a sub (instead of have 3 LEAs on the critical path from x to result. related: gcc missed-optimization bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82582). – Peter Cordes Oct 17 '17 at 16:22
Ok, you fixed the bug, but why don't you just do `LEA EDX,[EBX+EBX*2]` like a normal person to multiply by 3, instead of wasting an ADD instruction? – Peter Cordes Oct 17 '17 at 17:17
I'm glad to be criticized, because the criticism is always constructive; it is true: "LEA EDX, [EBX + 2 * EBX]" is much better, but I leave this implementation to other users, because to update my post could be excessive. – Paolo Fassin Oct 17 '17 at 17:39

compute -2xy - 3z with MSVC inline-assembly

1 Answers1