2

My application requires that I store the value in a 16 bit counter but due to pcb issues it requires that the lower 8 bits of the counter be reversed (01001110 to 01110010). The code is being written in C (GCC) and the counter register is "int" type (16 bits). My application is using an Atmel ATtiny 8 bit MCU. I understand that if I declare the counter register to be an "int" type the compiler will allocate 2 RAM locations. Do I just extract the lower byte with a mask, then rearrange the bits and then paste them back in with something like;

counter = counter & 0x00       clear lower byte value
counter = counter + (register with the reversed 8 bits)   

// Then, Replace lower byte value with new value

Should this work? Thanks

Michael Dorgan
  • 12,453
  • 3
  • 31
  • 61
  • That's how I would do it. If your processor has a reverse bits operation, that is probably the fastest way to do that, otherwise, the reverse operation is not too bad. There are multiple examples on SO on how to do that. – Michael Dorgan Apr 30 '19 at 17:52
  • 1
    FYI, your bitwise and statement will clear all the bits. It should be something like `counter &= 0xFF00` if it is a 16-bit value. – Michael Dorgan Apr 30 '19 at 17:54
  • 1
    and with 0xFF00. plus works but being so much bit related an OR feels better. plus is not wrong... – old_timer Apr 30 '19 at 18:44
  • Don't rely on any particular size of `int` - or use a signed type, use a `uint16_t`. – Clifford Apr 30 '19 at 22:04
  • Don't post "thank you" answers (or even comments) on SO is a Q&A not a discussion forum. You show appreciation on SO by accepting and/or up-voting answers. – Clifford May 01 '19 at 23:03

4 Answers4

6

Your word description of the process is correct, but your pseudo-code illustration inaccurate and incomplete.

You need to copy the LSB of counter before you clear it; otherwise you'll have lost the bits you need to reverse. You need to clear the LSB correctly, and you can reverse the LSB bits directly back into the counter LSB as follows:

// Copy counter LSB
uint8_t lsb = (uint8_t)(counter & 0xFFu) ;

// Clear counter LSB
counter &= 0xff00u ;

// Reverse LSB bits and mask into counter LSB
for( uint8_t mask = 0x80u;  
     mask != 0; 
     lsb >>= 1, mask >>= 1 )
{
    counter |= ((lsb & 0x01u) != 0) ? mask : 0 ;
}

You should also use the stdint.h types uint16_t and uint8_t for this operation rather than relying on int being any particular size - it will make the code more portable and testable on a system where int is not 16 bits. And generally you should use unsigned types when performing bit-wise operations.

A somewhat faster method, though possibly requiring a little more ROM space is to use a look-up table. A 256 byte lookup table is rather cumbersome to generate and on an ATtiny rather prohibitive in terms of memory usage. Rather it can be done almost as efficiently using a 16 byte lookup as follows:

// Copy counter LSB
uint8_t lsb = (uint8_t)(counter & 0xFFu) ;

// Clear counter LSB
counter &= 0xff00u ;

static const uint8_t lookup[] = { 0x0, 0x8, 0x4, 0xC, 
                                  0x2, 0xA, 0x6, 0xE, 
                                  0x1, 0x9, 0x5, 0xD, 
                                  0x3, 0xB, 0x7, 0xF } ;

counter |= lookup[lsb & 0xf] << 4 | lookup[lsb >> 4] ;

You could even pack-the lookup table and use just 8 bytes (0x80, 0xC4 etc):

static const uint8_t lookup[] = { 0x80, 0xC4, 
                                  0xA2, 0xE6,  
                                  0x91, 0xD5,  
                                  0xB3, 0xF7 } ;

uint8_t msnib = ( lsb & 0x01 ) ? lookup[(lsb & 0xf) >> 1] >> 4 : 
                                 lookup[(lsb & 0xf) >> 1] & 0xf ;

uint8_t lsnib = ( lsb & 0x10 ) ? lookup[(lsb & 0xf0) >> 5] >> 4 : 
                                 lookup[(lsb & 0xf0) >> 5] & 0xf ;
counter |= (lsnib | msnib << 4) ;

But the reduction in look-up table size is not likely to be justified by the increase in code size for the resulting additional bit manipulation - and its just a bit "too clever" - it took be a while to get it right!

The first method has the advantage that it can be applied to an arbitrary number of bits. Both look-up table solutions can be extended to any word size that is a multiple of 4 bits without changing the look-up table size, so scales well.


Benchmarking

I tested each implementation at https://godbolt.org/ set to AVR GCC 4.6.4 using three different optimisation settings. The instruction count excludes function entry/exit code added to make it compilable, and represents just the instructions generated from the source code in this answer.

|          |  Instruction Count |               |
|Algorithm | No Opt | -O3 | -Os | + Data (bytes)|
|----------|:------:|:---:|:---:|:-------------:|
| Loop     |   38   |  88 |  23 |       0       |
| LookUp16 |   59   |  38 |  37 |      16       |
| LookUp8  |  137   |  65 |  62 |       8       |

The test says little about execution time, but if code size is critical the loop algorithm with space optimisation (-Os) is probably the best choice.

The look-up table is no doubt faster regardless of optimisation level, and the the 16-byte look-up table with either optimisation may be a reasonable balance. For -O3 it is overall, smaller and faster than the 88-instruction unrolled loop. It also has the distinct advantage that the code size is far less variable with optimisation settings which can minimise surprises when switching between debug and release builds.

The 8-byte look-up has little merit perhaps other then being quite interesting.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • I agree that a LUT is probably not the right approach. It's rarely an appropriate general-purpose optimization on a large microcontroller, so it's almost certainly a bad idea on a severely memory-constrained MCU. Check to make sure that your compiler will optimize the bit-reversal loop. It might, but no guarantees. If not, you might be better off writing the bit-manipulation code yourself using some sort of SWAR algorithm. – Cody Gray - on strike May 01 '19 at 18:49
  • @CodyGray said: _"I agree that a LUT is probably not the right approach."_, but I am pretty sure I did not say that, so that is an opinion rather than agreement. I would not be quite so emphatic. It will no doubt be quicker than than the "walking-one" algorithm, and if the compiler optimised the loop by unrolling it, the look-up table solution might even be smaller. Each has its possible advantages, and the solution should take into account the space/time constraints and requirements of the application, The 256 element lookup suggested by Richard is certainly prohibitive. – Clifford May 01 '19 at 21:34
  • @CodyGray : I have added some code size metrics that suggest that the LUT is not without merit. – Clifford May 01 '19 at 22:46
1

You have a typo:

counter = counter & 0x00       clear lower byte value

should be

counter = counter & 0xFF00;

or

counter &= 0xFF00;

to clear the lower byte. You can reverse the bits by rotating a bit out at a time to another variable. If timing is critical, you need to do this in assembly as C does not have a rotate operator and the feature must be simulated, e.g.

new_byte = 0;

if (orig_byte & 0x80)
    new_byte |= 0x01;

if (orig_byte & 0x40)
    new_byte |= 0x02;
...

etc. is probably one of the fastest ways in C. or if you can spare 256 bytes of flash, then just use a table, e.g.

__flash unsigned char rotated_bytes[] = { 0x00, 0x80, 0x40, 0xC0, 0x20, ... };

new_byte = rotated_byte[orig_byte];

(Replace __flash with your compiler's extended keyword to mean "program memory")

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • Your look-up table values are incorrect, and the required operation is not a "rotation" in any case. You have also emphasised performance when no requirement was mentioned in the question - that is a premature optimisation. – Clifford Apr 30 '19 at 22:01
  • You are correct that I had a brainbleep on writing the table. I was starting to write 0b10000000, 0b01000000, 0b11000000, etc. and somehow I said, "let's do it in decimals and made mess of it. Oops. ATTiny has 2K to 16K of flash rom, so I am sure optimizations will eventually become important, even if not immediately. Besides, it is good to think about space vs. code tradeoff as "bag of the tricks" in embedded programming. And yes, I meant to write "shifting out" the bits. – Richard at ImageCraft May 01 '19 at 06:09
  • Perhaps, but the memory constraints suggest _space_ rather then _time_ optimisation are more likely and neither your unrolled loop or look-up are code-space efficient. You might have corrected the look-up - I've fixed it for you. – Clifford May 01 '19 at 10:08
  • A good compromise would be to have a 16 byte lookup of nibble (4 bit) values. – Clifford May 01 '19 at 10:13
  • 1
    The naive approach with eight `if`s is the best. Atmel comes with nice bit manipulation instructions. My gcc translates `if (orig_byte & 0x40) new_byte |= 0x02;` into just two assembler instructions! (SBRC: Skip next instruction if bit in register is cleared, followed by the OR instruction) – user5329483 May 01 '19 at 11:40
  • @user5329483 : "best" is subjective and a matter of opinion and local requirements and constraints, and only possible to claim if you have evaluated all other options against those requirements. It is probably on balance the best of the two options presented in this answer for this specific target. – Clifford May 01 '19 at 12:56
  • > if (x & 0x01) y |= 0x80; > if (x & 0x02) y |= 0x40; > if (x & 0x04) y |= 0x20; > if (x & 0x08) y |= 0x10; > if (x & 0x10) y |= 0x08; > if (x & 0x20) y |= 0x04; > if (x & 0x40) y |= 0x02; > if (x & 0x80) y |= 0x01; is just 8x 4 bytes, or 32 bytes, and at most 16 cycles. So it's pretty hard to beat. – Richard at ImageCraft May 02 '19 at 08:29
1

Here my approach:

uint16_t Flipper(uint8_t hi, uint8_t reversed_lo)
{
    uint8_t lo=0;
    if (reversed_lo & 0x01) lo |= 0x80;
    if (reversed_lo & 0x02) lo |= 0x40;
    if (reversed_lo & 0x04) lo |= 0x20;
    if (reversed_lo & 0x08) lo |= 0x10;
    if (reversed_lo & 0x10) lo |= 0x08;
    if (reversed_lo & 0x20) lo |= 0x04;
    if (reversed_lo & 0x40) lo |= 0x02;
    if (reversed_lo & 0x80) lo |= 0x01;
    return (hi<<8) | lo;
}

My compiler produces 25 instructions at the cost of 50 bytes for this function:

;reversed_lo sits in R22
;hi          sits in R21
;lo          goes to R18
000007DF 60.fd                SBRC R22,0        Skip if bit in register cleared 
000007E0 02.c0                RJMP PC+0x0003        Relative jump 
000007E1 20.e0                LDI R18,0x00      Load immediate 
000007E2 01.c0                RJMP PC+0x0002        Relative jump 
000007E3 20.e8                LDI R18,0x80      Load immediate 
000007E4 61.fd                SBRC R22,1        Skip if bit in register cleared 
000007E5 20.64                ORI R18,0x40      Logical OR with immediate 
000007E6 62.fd                SBRC R22,2        Skip if bit in register cleared 
000007E7 20.62                ORI R18,0x20      Logical OR with immediate 
000007E8 63.fd                SBRC R22,3        Skip if bit in register cleared 
000007E9 20.61                ORI R18,0x10      Logical OR with immediate 
000007EA 64.fd                SBRC R22,4        Skip if bit in register cleared 
000007EB 28.60                ORI R18,0x08      Logical OR with immediate 
000007EC 65.fd                SBRC R22,5        Skip if bit in register cleared 
000007ED 24.60                ORI R18,0x04      Logical OR with immediate 
000007EE 66.fd                SBRC R22,6        Skip if bit in register cleared 
000007EF 22.60                ORI R18,0x02      Logical OR with immediate 
000007F0 66.23                TST R22       Test for Zero or Minus 
000007F1 0c.f4                BRGE PC+0x02      Branch if greater or equal, signed 
000007F2 21.60                ORI R18,0x01      Logical OR with immediate 
000007F3 30.e0                LDI R19,0x00      Load immediate 
000007F4 a9.01                MOVW R20,R18      Copy register pair 
000007F5 58.2b                OR R21,R24        Logical OR 
000007F6 ca.01                MOVW R24,R20      Copy register pair 
000007F7 08.95                RET       Subroutine return 
user5329483
  • 1,260
  • 7
  • 11
0

The easy way to reverse bits in a Byte is using unions and bitfields as below.

> struct ST_BYTE {
    unsigned char b0    :   1;
    unsigned char b1    :   1;
    unsigned char b2    :   1;
    unsigned char b3    :   1;
    unsigned char b4    :   1;
    unsigned char b5    :   1;
    unsigned char b6    :   1;
    unsigned char b7    :   1;

} ;

union  U_BYTE
{
    struct ST_BYTE byteflag;
    unsigned char charflag;
};
unsigned char Reverse_Bits_in_Byte(U_BYTE local_byte)
{
    U_BYTE Byte_Var2;

    Byte_Var2.byteflag.b0 =local_byte.byteflag.b7;
    Byte_Var2.byteflag.b1 =local_byte.byteflag.b6;
    Byte_Var2.byteflag.b2 =local_byte.byteflag.b5;
    Byte_Var2.byteflag.b3 =local_byte.byteflag.b4;
    Byte_Var2.byteflag.b4 =local_byte.byteflag.b3;
    Byte_Var2.byteflag.b5 =local_byte.byteflag.b2;
    Byte_Var2.byteflag.b6 =local_byte.byteflag.b1;
    Byte_Var2.byteflag.b7 =local_byte.byteflag.b0;

    return (Byte_Var2.charflag);

}
void main()
{

    int i;
    for(i=0;i<8;i++)
    {
        Byte_Var1.charflag = pow(2,i);
        printf("\nBefore Reverse %02X\n", Byte_Var1.charflag);
        Byte_Var1.charflag = Reverse_Bits_in_Byte(Byte_Var1);
        printf("\nAfter Reverse %02X\n", Byte_Var1.charflag);
    }


}

Note that even though it is easy,not a recommended way for its own reasons.

It is the choice of programmer whether to adopt it or not.

Babajan
  • 364
  • 3
  • 5
  • The major problem with using bitfield is that the bit ordering is implementation defined. Luckily in this particular case, as the OP asks to reverse the bits, it does not matter whether the bit ordering is big or little endian :-) – Richard at ImageCraft May 07 '19 at 21:48
  • Even bit access is also implementation defined.some may access just a bit and some may access a byte to modify a bit.i proposed this logic for simple basic codes.author has to take a call. – Babajan May 08 '19 at 00:57