4

I am writing a C++ program and require a function that sets all 9 bits to 1 after all existing '1's.

That is, I'm going to write a function void set10BitsFull(int64_t& n) that for an integer "int64_t n = 0b...1000000000...", set10BitsFull(n) transforms n to "0b...1111111111...".

(Update) Bits of the input integer are sparsely set to 1, and there are at least 10 bits distance between two 1. For an sample input 0x20000200, the expected output is 0x3FF003FF. There will be at least 9 bits 0 after the last 1. The leftmost 10 bits will always be zero.

Here is my implementation of this function

/**
 * Inline function that set 10 bits to 1 after each set 1
 * i.e.,
 * ......1000000000...... -> ......1111111111.......
 *
 * @param n
 *      pointer of input number
 */
inline void set10BitFull(int_fast64_t *n) {
    // n = 1000000000
    *n |= (*n >> 1); // n = 1100000000
    *n |= (*n >> 2) | (*n >> 4) | (*n >> 6) | (*n >> 8); // n = 1111111111
}

In main loop of the program these two lines of code will be frequently called, and in previous testing, the computation cost is extremely high. Hence I would like to seek an approach that takes less computation overhead (less cpu cycles to compute), and possible solutions might include:

  • Use precomputed masks
  • Inline assembly
  • x86/gcc builtin intrinsic ...
Sander De Dycker
  • 16,053
  • 1
  • 35
  • 40
Vito Wu
  • 51
  • 4
  • 3
    Your questions needs some improvement about what the expected outputs are compared to inputs. "set 9 bits after all 1s" doesn't make a lot of sense. Can you clarify? Otherwise, I'm left wondering why your implementation isn't `n = n | 0x1ff` – selbie Jul 12 '19 at 06:33
  • Do you mean for example 0001000000000000 (12 zeros after the rigthmost one) should become 0001111111111000? – Lukas-T Jul 12 '19 at 06:35
  • 1
    You want every bit to be duplicated 9 times to the right ? What if input is `0b100` ? What if input is `0b1001001001001001001` ? Etc. As mentioned by others, you need to better define what you're trying to achieve. – Sander De Dycker Jul 12 '19 at 06:37
  • 2
    Also : be careful when bit shifting on a signed integer type. [Implementation defined and even undefined behavior are just around the corner](https://stackoverflow.com/questions/4009885/arithmetic-bit-shift-on-a-signed-integer). You should probably use a unsigned integer type. – Sander De Dycker Jul 12 '19 at 06:39
  • 1
    @selbie: "Inline function that set 10 bits to 1 after each set 1". That's pretty clear to me. – geza Jul 12 '19 at 06:39
  • What should happen if the value is 2 (`0b10`)? How many bits after the existing `1` bit should be set? What about any before? Suppose the value on input is `0b1000'0001'0000'0000'0000` — if the algorithm works from MSB to LSB, it will set 9 bits after the leading `1`, overwriting the pre-existing `1` with one of the 9 bits. Should it also have 9 set bits after the second `1`? – Jonathan Leffler Jul 12 '19 at 06:40
  • guys, he said " after each 1 i want 9 times 1", i don't see so much clarification needing. – Federico Jul 12 '19 at 06:40
  • @selbie Okay, I will try. Bits of the input integer are sparsely set to 1, and there are at least 10 bits distance between two 1. For an sample input 0x20000200, the expected output is 0x3FF003FF – Vito Wu Jul 12 '19 at 06:43
  • 1
    @Federico : the op is asking for an optimized method of doing something. Understanding exactly what the expected behavior is will greatly help with that. For example, maybe some of the possible inputs mentioned are just never possible. Maybe there's only ever one bit set. Maybe the behavior is not exactly how you understood it. Any of this knowledge could provide a faster way of doing this. – Sander De Dycker Jul 12 '19 at 06:45
  • @VitoWu : will there also at least be 9 0 bits after the last 1 ? Please update your question with this extra information. – Sander De Dycker Jul 12 '19 at 06:47
  • @VitoWu : will the MSB (leftmost bit) ever be set ? – Sander De Dycker Jul 12 '19 at 07:24
  • @SanderDeDycker No, in fact the leftmost 10 bits will always be zero. – Vito Wu Jul 12 '19 at 08:21

2 Answers2

6

You could do something like :

constexpr uint_fast64_t set10BitFull(uint_fast64_t n) {
    return (n << 1) - (n >> 9);
}

That should work with all inputs you described, where there are at least 9 0 bits after every 1 bit.

Sander De Dycker
  • 16,053
  • 1
  • 35
  • 40
2

First of all, you need to get rid of pointers, accessing memory is slowest operation processor will do. Second, you can decrease number of operations by constantly duplicating number of 1s.

I.e. something like this:

 n |= n >> 1;  // will porduce 1100000000
 n |= n >> 2;  // will produce 1111000000
 n |= n >> 4;  // will produce 1111111100
 n |= n >> 2;  // will produce 1111111111
sklott
  • 2,634
  • 6
  • 17
  • 1
    Optimizing compilers will put `*n` into a register, so the original code won't access memory for all `*n`. Decreasing operation count may help, of course. – geza Jul 12 '19 at 07:04
  • Godbolt disagrees about there being any performance difference (hint: it makes use of registers!) https://godbolt.org/z/lDXS7v – robthebloke Jul 12 '19 at 07:42
  • But it still read value at start of function from memory and writes to memory at end of function, so I guess if this is called often refactoring this to something like `inline int_fast64_t set10BitFull(int_fast64_t n)` will be much faster. – sklott Jul 12 '19 at 07:45
  • @sklott That's not entirely true. you may save a couple of movs in the function body itself, so that *may* be true if you are DSO exporting the method. However, in pretty much every place where you could possibly use the method, the compiler will modify both so that they are identical. https://godbolt.org/z/bd21vI – robthebloke Jul 12 '19 at 07:51
  • 1
    If the data is in L1 cache, it is very fast (definitely not "slowest operation processor will do". An integer division is much slower.). Besides, optimizing compilers will remove extraneous memory accesses (this is inline function, so it will definitely be removed, if it is possible). Not to mention, that with several ABIs, passing `n` as a parameter will access memory as well (if it is passed through stack, not registers). – geza Jul 12 '19 at 07:59