Algorithm for copying N bits at arbitrary position from one int to another

Question

An interesting problem I've been pondering the past few days is how to copy one integer's bits into another integer at a given position in the destination integer. So, for example, given the destination integer 0xdeadbeef and the source integer 0xabcd, the idea would be to get a result of 0xabcdbeef (given a destination position of 16 bits) or 0xdeabcdef (given a destination position of 8 bits).

With the arbitrary limitation of avoiding conditionals or loops (allowing myself to use just mathematical/bitwise operations), I developed the following function (C++)

int setbits(int destination, int source, int at, int numbits)
{
    int ones = ((1<<(numbits))-1)<<at;
    return (ones|destination)^((~source<<at)&ones);
}

where at is the place where the source bits should be copied into the destination number (0-31) and numbits is the number of bits being copied from source (1-32). As far as I can tell, this algorithm works for all values except for at = 0 and numbits = 32 (the case when the entire destination integer is being overwritten by the source integer) due to the fact that 1<<32 results in 1 (since the shift wraps around) as opposed to 0.

My questions are:

How is this normally done? Are there any particularly notable algorithms used (by notable, I'm asking if there are any particularly efficient tricks that can be used to do this)?
Does my algorithm work as well as I think it does (that is, works for all values except at = 0 and numbits = 32)?
Related to 1), is there any way to do this only using mathematical/bitwise operators? The algorithm for all values is trivial using conditions or loops, so I'm not interested in that.

Algorithm design is usually a weak point for me, so I have no idea whether or not my algorithm is 'as good as it gets' when only using mathematical/bitwise operations. Thanks

That's pretty slick! I occasionally need to manipulate oddly-sized strings of bits in an environment where every cycle counts -- I'll be sure to add this to my bag of tricks. — Jim Lewis, Aug 16 '09 at 03:53
I should caution that I've only done some superficial testing, so I can't guarantee that this method works 100% — GRB, Aug 16 '09 at 04:07
Might be safer to use unsigned int to avoid any funny business with sign bits. — Craig McQueen, Aug 16 '09 at 06:12

Fernando N. · Answer 1 · 2009-08-20T08:08:29.260

8

I don't think it can be done more efficient unless you write assembler.

You can improve the readability and solve your overflow problem changing some little things:

int setbits2(int destination, int source, int at, int numbits)
{
    // int mask = ((1LL<<numbits)-1)<<at; // 1st aproach
    int mask = ((~0u)>>(sizeof(int)*8-numbits))<<at; // 2nd aproach
    return (destination&~mask)|((source<<at)&mask);
}

More efficient assembler version (VC++):

// 3rd aproach
#define INT_SIZE 32;
int setbits3(int destination, int source, int at, int numbits)
{ __asm {
    mov ecx, INT_SIZE
    sub ecx, numbits
    or  eax, -1
    shr eax, cl
    mov ecx, at
    shl eax, cl // mask == eax
    mov ebx, eax
    not eax
    and eax, destination
    mov edx, source
    shl edx, cl
    and edx, ebx
    or  eax, edx
}}

1st aproach: Slower on 32bit architecture
2nd aproach: (~0u) and (sizeof(int)*8) are calculated at compile time, so they don't charge any cost.
3rd aproach: You save 3 ops (memory accesses) writing it in assembler but you will need to write ifdefs if you want to make it portable.

edited Aug 20 '09 at 08:08

answered Aug 16 '09 at 09:31

Fernando N.

6,369
4
27
30

On a 32 bit architecture this comes with a performance hit. He didn't ask for a code beautyfier or for solving "the overflow problem" which might not even be there if the funtion is only used for numbits<32. He did ask for a faster version. – Gunther Piez Aug 16 '09 at 10:24
First, did you even read my first sentence?? Second, where did you read the function is only used for numbits<32? "numbits is the number of bits being copied from source (1-32)". And third, readability is fundamental when you write algorithms, and was just an advise. It's true that on a 32bit architecture has a little perfomance hit, but contrary to accepted solution it is architecture portable. – Fernando N. Aug 16 '09 at 19:35
@fnieto: sizeof(int)/sizeof(destination)/sizeof(source) would also work and not require the ACE header – GRB Aug 17 '09 at 00:31
@GRB: you are right, in this case I think sizeof will always evaluate at compile-time. So I edit to change ACE_SIZEOF_INT to sizeof(int). Anyway would be good if somebody confirm this is not compiler dependant. In a first look to the standard I don't find it. – Fernando N. Aug 17 '09 at 07:08
1

5.19/1 guarantees sizeof is a constant expression (and therefore I think that implies it is evaluated at compile-time, see 3.6.2/1 for that) – GRB Aug 17 '09 at 15:21
Yes, only in C99 there is a case (VLA) where sizeof does not evaluate at compile time. Have a look to litb answer here: http://stackoverflow.com/questions/671790/how-does-sizeofarray-work – Fernando N. Aug 17 '09 at 18:44

score 3 · Accepted Answer · answered Aug 16 '09 at 06:21

3

I don't think it's the case that 1<<32 wraps (otherwise, why doesn't 2<<31 also wrap?), instead I think that internally modulus 32 is applied to the second operator, so that 1<<32 is actually equivalent to 1<<0. Also, consider changing the parameters types from "int" to "unsigned int". To get the value of "ones" without running into the "1<<32" problem, you can do this:

unsigned int ones = (0xffffffff >> (32-numbits)) << at;

I don't believe there are any "standard" methods for this kind of operation. I'm sure there are other ways of using bitwise operators in different ways to achieve the same outcome, but your algorithm is as good as any.

Having said that, though, maintainability and documentation is also important. Your function would benefit from the algorithm being documented with a comment, especially to explain how you use the bitwise XOR -- which is clever, but not easy to understand at first glance.

answered Aug 16 '09 at 06:21

Todd Owen

15,650
7
54
52

That substitutes a problem where `numbits == 0` for the one where `numbits == 32`, but since that doesn't make a whole lot of sense anyway it could be excluded from the allowed domain of the function. – caf Aug 16 '09 at 10:20
You're right, "wrap" was probably the wrong choice of words. I was meaning to refer to that internal modulus operation that is performed (that you mention). – GRB Aug 16 '09 at 14:21
2

This solution is architecture dependant. Replace 0xffffffff by ~0 (no cost, compile time) and 32 by a macro INT_SIZE defined for the architectures you want to support. – Fernando N. Aug 16 '09 at 22:46
1

~0 is indeed slick. Here's a calcuation for "ones" which achieves portability without even using INT_SIZE: ~(~0 << numbits) << at; – Todd Owen Aug 16 '09 at 23:30
I'm afraid not @Todd, when you do (~0 << 32) you get 0xffffffff – Fernando N. Aug 16 '09 at 23:54
Oh right, that is so annoying! Is this behaviour of the bitwise shift operations standardized for C, or does it depend on SHL/SHR operations on the underlying architecture?? This will affect the boundary cases for not only *numbits*, but for *at* too (although that's probably more acceptable, since *at* > 31 really doesn't make any sense). – Todd Owen Aug 17 '09 at 05:33
@Todd: Haha, that's funny to see you make exactly the same mistake as you corrected me on (shows how unintuitive it is!). As far as the standard goes, 5.8/1 says the result is undefined if the right operand of a byteshift is negative or greater than the number of bits in the left operand. – GRB Aug 17 '09 at 15:30
@ToddOwen: Officially, shifting signed values equal to the number of bits is undefined, and many just don't shift, in which case the underlying architecture isn't even relevant. – Mooing Duck Apr 05 '16 at 02:45

score 2 · Answer 3 · answered Aug 16 '09 at 06:06

It pretty good: I tried this alternate version, but yours was about 30% faster in testing:

    int[] bits = new int[] {0,1,3,7,15,31,63,127,255,511,1023
        ,2047,4095,8192,16383,32767,65535,131071,262143,524287
        ,1048575,2097151,4194303,8388607,16777215,33554431,67108863
        ,134217727,268435455,536870911,1073741823,2147483647,-1};

    public int setbits2(int destination, int source, int at, int numbits)
    {
        int ones = bits[numbits + at] & ~bits[at];
        return (destination & ~ones) | ((source << at) & ones);
    }

That table would probably make more sense in hex – Mooing Duck Apr 05 '16 at 02:47 — Mooing Duck, Apr 05 '16 at 02:47

score 2 · Answer 4 · answered Dec 06 '10 at 17:38

2

Generalized GRB-fnieto form...

template <typename T>
T setbits4(T destination, T source, int at, int numbits)
{
    T mask = (((T)-1)>>(sizeof(T)*8-numbits))<<at; // 4th aproach
    return (destination&~mask)|((source<<at)&mask);
}

answered Dec 06 '10 at 17:38

Sandeep Datta

28,607
15
70
90

score 0 · Answer 5 · answered Aug 16 '09 at 01:10

0

I think it hardly could be more efficient. Moreover, bitwise operations are much faster than any algebraic operations.

answered Aug 16 '09 at 01:10

Havenard

27,022
5
36
62

I think you are wrong on both points. It can be more efficient (see my answer) and at least addition and subtraction are performed with the same speed as bitwise operations on every arch I know. – Gunther Piez Aug 16 '09 at 10:15

Rao · Answer 6 · 2016-04-05T03:35:25.770

uint32_t copy_bits(uint32_t dst, uint32_t src, uint8_t end_bit, uint8_t start_bit)

{

uint32_t left, right, mask, result;

if (end_bit <= start_bit)
{
    printf("%s: end_bit:%d shall be greater than start_bit: %d\n", __FUNCTION__, end_bit, start_bit);
    return 0;
}

left   = ~0; // All Fs
right  = ~0;
result = 0;
left  >>= ((sizeof(uint32_t)*8) - end_bit); // Create left half of mask
right <<= start_bit; // Create right half of mask
mask   =  (left & right); // Now you have the mask for specific bits
result = (dst & (~mask)) | (src & (mask));
printf("%s, dst: 0x%08x, src: 0x%08x, end_bit: %d, start_bit: %d, mask: 0x%08x, result: 0x%08x\n",
      __FUNCTION__, dst, src, end_bit, start_bit, mask, result);

return result;

}

your answer would be more meaningful if you explained it. – devlin carnate Apr 05 '16 at 02:43 — devlin carnate, Apr 05 '16 at 02:43

Anr · Answer 7 · 2014-10-01T23:56:49.367

// SET OF FUNCTIONS

//##########    BIT - BIT   
template < typename var_t >     inline  var_t       bit_V           ( uint8_t b )                                               { return var_t(1) << b; }           // Same as usual macros, but this one converts de variable type, so that you can use it in uint8_t to uint64_t for example.
template < typename var_t >     inline  var_t       bit_get         ( const var_t & V , uint8_t b )                             { return V &    bit_V<var_t>(b); }  // Can be used as bool or to get the mask of the bit.
template < typename var_t >     inline  var_t       bit_settled     ( const var_t & V , uint8_t b )                             { return V |    bit_V<var_t>(b); } 
template < typename var_t >     inline  var_t       bit_unsettled   ( const var_t & V , uint8_t b )                             { return V &~   bit_V<var_t>(b); } 
template < typename var_t >     inline  void        bit_set         ( var_t & V , uint8_t b )                                   {        V |=   bit_V<var_t>(b); } 
template < typename var_t >     inline  void        bit_unset       ( var_t & V , uint8_t b )                                   {        V &=  ~bit_V<var_t>(b); } 
template < typename var_t >     inline  void        bit_mod         ( var_t & V , uint8_t b , bool set )                        { if (set) bit_set(V,b); else bit_unset(V,b); } //  compiler will optimize depending on if 'set' is constant.
template < typename var_t >     inline  void        bit_cpy         ( var_t & V , const var_t & S , uint8_t b )                 { var_t t = bit_get(S,b); V |= t; V &~ t; } 
template < typename var_t >     inline  void        bit_cpy         ( var_t & V , const var_t & S , uint8_t bV , uint8_t bM )   { bit_mod(V,bV,bit_get(S,bM)); } 
/// MULTIPLE BITS:
template < typename var_t >     inline  void        bits_set        ( var_t & V , const var_t & S )                                     { V |=  S;  }
template < typename var_t >     inline  void        bits_unset      ( var_t & V , const var_t & S )                                     { V &= ~S; }
/// ONLY WITH UNSIGNED INTS:    'at' parameters are refered to the less significant bit (lsb), starting at 0 index ( a byte would have 7 to 0 bits ).
template < typename var_t >             void        bits_cpy        ( var_t & V , const var_t & S , uint8_t numBits , uint8_t atlsb = 0  )  {   //  I choosed not to make this one inline
                                                                        var_t       mask = (~var_t(0)>>(sizeof(var_t)*8 - numBits))<<atlsb;
                                                                        bits_unset  ( V , mask ) ; 
                                                                        bits_set    ( V , S & mask  ) ; 
                                                                    }
template < typename var_t >             void        bits_cpy        ( var_t & V , const var_t & S , uint8_t numBits , uint8_t atVlsb , uint8_t atSlsb ) {   //  I choosed not to make this one inline
                                                                        bits_cpy ( V , (atVlsb>atSlsb)?(S<<(atVlsb-atSlsb)):(S>>(atSlsb-atVlsb)) , numBits , atVlsb ) ; 
                                                                    }
template < typename var_t >             var_t       bits_cpyd       ( const var_t & V , const var_t & S , uint8_t numBits , uint8_t atlsb = 0  )    { 
                                                                        var_t r = V;
                                                                        bits_cpy    (r,S,numBits,atlsb); 
                                                                        return r;
                                                                    }
template < typename var_t >             var_t       bits_cpyd       ( const var_t & V , const var_t & S , uint8_t numBits , uint8_t atVlsb , uint8_t atSlsb )   {   
                                                                        var_t r = V;
                                                                        bits_cpy    (r,S,numBits,atVlsb,atSlsb); 
                                                                        return r;
                                                                    }

//##########    BIT - BIT   - EXAMPLE OF USE WITH THE MOST RELEVANT FUNCTIONS:
// I used them inside functions, to get/set two variables inside a class, u and c

                void                u_set               ( edrfu_t u )       {           bits_cpy    <uint32_t>  ( CFG       , u         , 8     , 2             ,0              );}
                edrfu_t             u_get               ()                  { return    bits_cpyd   <uint32_t>  ( 0         , CFG       , 8     , 0             ,2              );}
                void                c_set               ( edrfc_t c )       {           bits_cpy    <uint32_t>  ( CFG       , c         , 2     );}
                edrfc_t             c_get               ()                  { return    bits_cpyd   <uint32_t>  ( 0         , CFG       , 2     );}

It is tested, but any improvements are welcome. Any sugestion about inlining or not the 'big' functions at the end? — Anr, Oct 01 '14 at 22:17
About performance, I think it is equivalent to the proposed previously. Maybe bits_cpy is a bit faster because it only takes four operations (aside the mask), instead of five. — Anr, Oct 01 '14 at 22:26

Algorithm for copying N bits at arbitrary position from one int to another

7 Answers7

Linked