Finding an efficient shift/add/LEA instruction sequence to multiply by a given constant (avoiding MUL/IMUL)

Question

I'm trying to write a C program mult.c that has a main function that receives 1 int argument (parsed with atoi(argv[1])), that is some constant k we want to multiply by.

This program will generate an assembly file mult.s that implements

int mult(int x) {
   return x * k;
}

for that constant k. (This is a followup to Efficient Assembly multiplication)

For example: if main() in mult.c gets 14 as argument it may generate (though it is not minimal as later emphasized):

.section .text
.globl mult
mult:              # by 14
    movl %edi,%eax         # arg passed in EDI
    shl $3,%eax
    movl %edi,%ecx
    shl $2,%ecx
    addl %ecx,%eax
    movl %edi,%ecx
    shl $1,%ecx
    addl %ecx,%eax
    ret

The assembly function recieves its argument in the %rdi register and returns its result in the %rax register.

I need to consider these following rules:

0) The assembly program should be minimal in terms of lines of code. (Dynamic instruction count). I'll worry about critical path latency later.

if k is 0,1,-1 no need to multiply (obviously) - xor %eax,%eax, mov %edi, %eax, or neg
if k is 3,5,9 I then need to replace the multiplication command in LEA.
if k is 3*2^n, 5*2^n, 9*2^n then I need to replace the multiplication command in LEA and shifting.
all other cases suppose to work also of course.

I'm having trouble with cases 3 and 4.

how can I identify a number is of this structure in my c program?
what do I do with negative numbers?

And a general cases that relates to both, Am I just suppose to mask, add and shift my constant 32 times (the number of bits in an integer)? Is there anything better?

x86 doesn't have a MULT instruction? Are you thinking of [IMUL](https://www.felixcloutier.com/x86/imul), or maybe widening MUL? Maybe you're thinking of MIPS's MULT mnemonic. — Peter Cordes, Dec 11 '19 at 13:40
It does but the whole point of this exercise is to avoid using it so we get a feel of what happens behind the scenes... — Blur, Dec 11 '19 at 13:42
Yes, but if you only need to avoid an instruction that doesn't exist, the question would be trivial :P The answer would be to use `imul` because it's not called `mult`. Anyway, fixed your title for you along with the formatting mess that was the rest of your question. (Also added some fixes, have a look at the diff. e.g. `main` gets string args in argv, not `int`.) — Peter Cordes, Dec 11 '19 at 13:57
re: negative numbers: you can just treat `k` as unsigned. For 2's complement, the only difference between signed and unsigned multiplication is in the high 32 bits of a full 32x32 => 64-bit multiply. (e.g. on x86, `mul` and `imul` give the same EAX, potentially different EDX. This is why only `imul` has new efficient versions like `imul r, r/m32`) — Peter Cordes, Dec 11 '19 at 14:01
re: the general case: in a real compiler you *would* just fall back to `imul eax, edi, 12345` for cases with a lot of set bits, where it would take more than some threshold of shift / add / sub instructions. Also, IIRC, my answer on [Efficient Assembly multiplication](//stackoverflow.com/q/59227227) pointed out a specific technique for stripping off an arbitrary power of 2 so you can check what's left. `shiftcount = __builtin_ctz` then right shift by that, removing all the leading zeros. So you could do that for case 3. — Peter Cordes, Dec 11 '19 at 14:04
Note that there aren't many numbers of the form for case 3. I would build a look up table for all numbers of the form and simply check if it's in there. — fuz, Dec 11 '19 at 14:19
How is this question different to the question you say it is a follow-up to? Seems to me that @PeterCordes already addressed the algorithm for determining if a multiplier has exactly two bits set, as well as figuring out the power-of-two factor, in his [excellent answer](https://stackoverflow.com/a/59229397/1566221). — rici, Dec 11 '19 at 15:04
@rici: I added the fact that it was a followup while tidying up this question, the OP neglected to mention it at all. I think this question is looking for more detailed help with implementation details, and as such is more like a beginner / homework question where we need the OP to make an attempt in actual code that might be buggy or have one *small* missing piece. (I was surprised this got asked; I thought I already pretty much covered this in sufficient detail to implement, like you said.) — Peter Cordes, Dec 11 '19 at 15:10
Hmm, correct me if I'm wrong but this reminds of a rather important task of finding an optimal sequence for a "modular exponentiation using variable size sliding windows technique". As far as I can recall (although it's been a while) the solution of this task is not trivial, so - good luck with that. )) Important bit related to the question, though - does your *k* has any limits? Min, max? — tum_, Dec 11 '19 at 15:27

Finding an efficient shift/add/LEA instruction sequence to multiply by a given constant (avoiding MUL/IMUL)

0 Answers0

Linked