0

A simple implementation of the popcnt function in C:

int popcnt(uint64_t x) {
  int s = 0; 
  for (int i = 0; i < 64; i++) {
    if ((x << i) & 1 == 1) s++;
  }
  return s;
}

I am using inline assembly language (x86-64) to implement popcnt,

int asm_popcnt(uint64_t x) {
  int i = 0, sum = 0;
  uint64_t tmp = 0;
  asm ( ".Pct:               \n\t"
        "movq   %[xx],  %[tm]\n\t"
        "andq    $0x1,  %[tm]\n\t"
        "test   %[tm],  %[tm]\n\t"
        "je      .Grt        \n\t"
        "incl   %[ss]        \n\t"
        ".Grt:               \n\t"
        "shrq    $0x1,  %[xx]\n\t"
        "incl   %[ii]        \n\t"
        "cmpl   $0x3f,  %[ii]\n\t"
        "jle     .Pct        \n\t"
        : [ss] "+r"(sum)
        : [xx] "r"(x)  , [ii] "r"(i), 
          [tm] "r"(tmp)
  );
  return sum;
}

but received WA (online judge)

I tested all powers of 2 (from 0x1 to (0x1 << 63)) on my computer and it returned 1, which indicates that my asm_popcnt can identify all bits of any 64_bits integer since all other integers are just combinations of 0x1, 0x2, 0x4, etc.(for example, 0x11a = 0x2 "or" 0x8 "or" 0x10 "or" 0x100). Therefore there shouldn't be cases for OJ to return a "WA". Is there anything wrong in my code? The jump instruction?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Your inline asm modifies some of the operands you told the compiler were read-only inputs. It probably breaks after inlining with optimization enabled by stepping on some registers the compiler expected to stay untouched. Also, what does WA mean? – Peter Cordes Nov 04 '21 at 09:10
  • To be clear, you are not interested in the [intrisic](https://stackoverflow.com/questions/3849337/msvc-equivalent-to-builtin-popcount), are you? – Bob__ Nov 04 '21 at 09:11
  • And BTW, your pure C code is buggy; looks like you want `>>` instead of `<<` for that if condition. Your inline asm correctly uses `shrq`. But anyway, you could just `s += x & 1;` / `x >>= 1;`. Or in asm, shift the bit out into CF and `adc $0, %[ss]`. Of course naive 1-bit at a time loops are not efficient; use `__builtin_popcountll()` to have GCC call an efficient helper function using bit-manipulation, or to inline a `popcnt` instruction if supported by the target. – Peter Cordes Nov 04 '21 at 09:11
  • You use Shift Left (`<<`) in you C code and Shift Right (`shrq`) in your ASM code. `>>` != `<<`. It should be Shift Right (`>>`). – Sani Huttunen Nov 04 '21 at 09:12
  • Thank you guys. I didn't pay too much attention to the C code. WA is short for "Wrong Answer" in the Online Judge I use. – dont know who Nov 04 '21 at 09:53
  • Furthermore there is the [popcnt](https://www.felixcloutier.com/x86/popcnt) instruction, that you could use in the inline assembly, except if you really have to implement this loop – JCWasmx86 Nov 04 '21 at 10:07

0 Answers0