0

I am doing a program, and at this point I need to make it efficient. I am using a Haswell microarchitecture (64bits) and the 'g++'. The objective is made use of an ADC instruction, until the loop ends.

//I removed every carry handlers from this preview, yo be more simple
size_t anum = ap[i], bnum = bp[i];
unsigned carry;

// The Carry flag is set here with an common addtion  
anum += bnum;
cnum[0]= anum;
carry = check_Carry(anum, bnum);

for (int i=1; i<n; i++){

    anum  = ap[i];
    bnum = bp[i];

    //I want to remove this line and insert the __asm__ block
    anum += (bnum + carry);
    carry = check_Carry(anum, bnum);

    //This block is not working
    __asm__(
            "movq   -64(%rbp), %rcx;"
            "adcq   %rdx, %rcx;"
            "movq   %rsi, -88(%rbp);"
    );

    cnum[i] = anum;
}

Is the CF set only in the first addition? Or is it every time I do an ADC instruction?

I think that the problem is on the loss of the CF, every time the loop is done. If it is this the problem how I can solve it?

  • 3
    That will never work like that, the carry flag can be changed by the time you get there. If you insist on optimizing this, you should code the entire loop in pure asm so you don't have to bother with gcc inline asm which is a complicated beast. – Jester Feb 05 '16 at 15:28
  • All my code is not here.. Here is just the main grain. In my code I do carry control with another local variable, but I want to remove that. – Hélder Gonçalves Feb 05 '16 at 15:32
  • 3
    What exactly is the code supposed to do? For each iteration of the loop, do you want to use the same CF value set by line 2? If so, you need to use a `setc` instruction or something to preserve the state of the flag. Remember that *most* x86 instructions set flags, which clobbers the old state of the flags. – Cody Gray - on strike Feb 05 '16 at 15:32
  • If you don't want to save the flag yourself, the only way is to put the whole thing into asm so you can make sure CF is not clobbered. And the best way to do that is to put it into a separate asm module. – Jester Feb 05 '16 at 15:39
  • This code is to do an addition in very large numbers. I want to use the result of the last addition in every iterations, to know if the occur an overflow in the last one. If there are instructions that can clobbers the old state of the flags, how I can ensure that this does not happen? Isn´t `CF` set every time I do the ´ADC´ instruction? If it is that, I understand what is going wrong. – Hélder Gonçalves Feb 05 '16 at 15:53
  • My goal is to use the `CF` until the loop ends. – Hélder Gonçalves Feb 05 '16 at 15:54
  • It's not completely clear what you are wanting, but: yes `ADC` takes CF as an input and produces it as an output. However, MANY other instructions muck with the flags, so you may need to use some instructions to save/restore the CF. – John Hascall Feb 05 '16 at 17:30
  • See the [GNU multiprecision library](https://gmplib.org/). It's LGPL, so you can just use it from pretty much anything, or if your code is GPL you can have a look at GMP's `adc` loop. Your loop can't work, because flags aren't preserved from the end of one `asm` statement to the start of the next. Almost all instructions set `CF`, so `ja` / `jb` can test it, so the loop outside your asm clobbers it. http://stackoverflow.com/questions/32084204/problems-with-adc-sbb-and-inc-dec-in-tight-loops-on-some-cpus/32087095#32087095 explains some issues you'll see in doing the whole thing in asm. – Peter Cordes Feb 07 '16 at 16:45
  • If the array size is fixed at compile time, you might try: http://stackoverflow.com/a/35306367/2189500 – David Wohlferd Mar 20 '16 at 06:58

1 Answers1

0

You use asm like this in the gcc family of compilers:

 int src = 1;
 int dst;

 asm ("mov %1, %0\n\t"
     "add $1, %0"
     : "=r" (dst)
     : "r" (src));

 printf("%d\n", dst);

That is, you can refer to variables, rather than guessing where they might be in memory/registers.

[Edit] On the subject of carries: It's not completely clear what you are wanting, but: ADC takes the CF as an input and produces it as an output. However, MANY other instructions muck with the flags, (such as likely those used by the compiler to construct your for-loop), so you probably need to use some instructions to save/restore the CF (perhaps LAHF/SAHF).

John Hascall
  • 9,176
  • 6
  • 48
  • 72
  • Why the down-votes, I wonder ? Seems like a useful answer (apart from missing the need for `adc`, which is easily fixed) ?... – Paul R Feb 05 '16 at 15:35
  • 3
    While it does point in the right direction, it doesn't answer the question and over-simplifies things. Could have been just a comment saying that gcc asm blocks use constraints to specify input-output, with a link to said docs. – Jester Feb 05 '16 at 15:38
  • So, including an example == a down vote. Makes sense. Not. – John Hascall Feb 05 '16 at 15:39
  • 4
    Including a general example of how inline asm looks, while not addressing any of the specific issues such as CF being clobbered by other compiler generated code, does earn a -1 from me. PS: we don't need to explain downvotes here on SO, and I can't know why the other person downvoted. – Jester Feb 05 '16 at 15:41
  • It addresses exactly the specific issue: using `-88(%rbp)` is doomed to failure. – John Hascall Feb 05 '16 at 15:42
  • 4
    I've downvoted because the answer only addresses one part of the question regarding passing variables into the template but it doesn't in any way address the bigger elephant in the code - the question regarding the CF flag. The latter being far more involved and not covered in your answer. – Michael Petch Feb 05 '16 at 16:20
  • People don't like partial answers even if it clearly explains part of the answer much more clearly than the other full answers. It has to fit in a comment or not said at all. – QuentinUK Feb 07 '16 at 12:51
  • This inline asm fragment is ugly. Why would you put a `mov` in the inline asm instead of using constraints to let the compiler save the old value or not? e.g. `asm ("inc %0": "=r" (dst) : "0" (src) );` Then the compiler knows to put the input in the same register that will hold the output, and from that it can figure out that it needs to use a `mov` if `src` is used after the asm statement. Otherwise it can skip it. See the x86 tag wiki info page for some more inline asm links. I'm downvoting for suggesting slow code on a question that asks about high performance. – Peter Cordes Feb 07 '16 at 16:39
  • @PeterCordes as I pointed out in the 2nd comment, this example is taken directly from the gcc manual... It's point is to illustrate that using `-88(%rbp)` is not wise, you should use the input/output/clobbers features of gcc's asm – John Hascall Feb 07 '16 at 18:50
  • @JohnHascall: The manual even says it's not a good example of how you might *actually* use inline asm. It does address one of the OP's problems. I'm probably over-reacting. I think I've left worse-answers than this alone instead of downvoting, and might just be piling on the bandwagon. But I'm still convinced that it's a terrible example to pick out of the manual and show someone. IMO, the manual should have a better example. Maybe I'll submit a patch. You always want the compiler to emit any necessary `mov` instructions, so it has as much choice in the matter as possible. – Peter Cordes Feb 07 '16 at 18:59
  • I think they just wanted something simple that people could understand. – John Hascall Feb 07 '16 at 19:20