Reversing bits in a byte with AVR

Question

I am currently working on a problem that wants me to build a subroutine that will reverse the bits in R16.

00000011 => 11000000
or
10101000 => 00010101

For the class we are using the AVR subset and the subroutine needs to work in norfair.

This is what I have so far, any help would be appreciated!

ldi r16,3 ;00000011

Can you use a lookup table? e.g. for 4-bit halves. That's usually fastest on ISAs without a bit-reverse instruction. — Peter Cordes, Apr 13 '20 at 00:48
Do you have any ideas of what algorithm to do? How about for starters, I suggest you right-shift R16, figure out what bit got shifted off the end of it, and then left-shift a temporary register and place that bit in the least-significant position. Can you see what to do after that to finish the program? — David Grayson, Apr 13 '20 at 00:52
@phuclv: The more explicit title is better, but the old title was fine IMO. "reversing [a] byte" (singular) implies reversing the bits, because a single byte is made of bits. Perhaps you assumed that "reversing byte" was a typo or grammatical error for "reversing byteS" (plural)? That would have been an error. Omitting the indefinite article "a" in a phrase like "reversing byte" is common in newspaper headline style, which question titles can use. But if it gave you the wrong impression at first glance, then that's a valid problem that was fixed by this edit. — Peter Cordes, Apr 13 '20 at 01:51

phuclv · Answer 1 · 2020-04-27T13:41:09.143

The naive solution is to loop through the bits with the shift operator and check. But be aware that AVR doesn't have a barrel shifter so it can only shift by 1, any other shift counts need more than 1 instruction. Here's one obvious solution from the famous bithacks page

uint8_t reverse_obvious(uint8_t v)
{
    uint8_t r = v & 1; // r will be reversed bits of v; first get LSB of v
    uint8_t s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

    for (v >>= 1; v; v >>= 1)
    {   
        r <<= 1;
        r |= v & 1;
        s--;
    }
    r <<= s; // shift when v's highest bits are zero
    return r;
}

The above snippet uses only shifts by 1 except the last r <<= s which needs a loop in AVR. You can avoid that by always running 8 loops

uint8_t reverse(uint8_t x)
{
    uint8_t mask_up = 0x01;
    uint8_t mask_down = 0x80;
    uint8_t result = 0;
    for (; mask_down; mask_down >>= 1, mask_up <<= 1)
    {
        if (x & mask_up)
            result |= mask_down;
    }
    return result;
}

Another alternative that has shift by 2, but I guess it's the best way you can do without a lookup table. AVR has plenty of available ROM for the table so it should be a lot more efficient that way

uint8_t reverse(uint8_t x)
{
    x = (((x & 0xaaU) >> 1) | ((x & 0x55U) << 1));
    x = (((x & 0xccU) >> 2) | ((x & 0x33U) << 2));
    x = (((x & 0xf0U) >> 4) | ((x & 0x0fU) << 4));
    return x;
}

Some compilers also have built-ins to reverse bits. For example Clang has __builtin_bitreverse8() and GCC has __builtin_avr_insert_bits() which can be used to reverse bits:

// reverse the bit order of bits
__builtin_avr_insert_bits (0x01234567, bits, 0)

Unfortunately their outputs are terrible

There are also lots of questions with good answers on SO about reversing bits. Try converting the C code to assembly and compare with the result on compiler explorer

Note that AVR shift instructions only go by 1 bit at a time. So a 256-byte lookup table for the whole byte would be good, but a 16-byte LUT of nibble halves would require shifts to use. I guess with 2x 16-byte tables for low and high halves, you only need 4 shifts, 2 loads, and an OR to use, and that's better than 16 operations + loop overhead to do 8x [lsr](https://onlinedocs.microchip.com/pr/GUID-0B644D8F-67E7-49E6-82C9-1B2B9ABE6A0D-en-US-1/index.html) into CF + 8x ADC same,same aka ROL reg. But that means the version using `(x & 0xf0U) >> 4` and so on is probably terrible on AVR. — Peter Cordes, Apr 14 '20 at 18:17
IDK if you can convince a compiler to make code that shifts into CF and shifts back in to another register. But that's obviously at least twice as good as actually doing AND with 1 and OR into another register between shifting. Maybe 3x if a MOV is also needed to copy a reg. — Peter Cordes, Apr 14 '20 at 18:20
@PeterCordes AVR has a `swap` instruction so it can be used as a fast shift-by-4, and the compiler does that if you open the godbolt link in [my other answer](https://stackoverflow.com/a/17908514/995714) https://gcc.godbolt.org/z/TzKr53. For the first solution above the compiler doesn't actually use CF but a pair of SBRC/ORI which is just as equally fast — phuclv, Apr 15 '20 at 00:37
It uses `mov` before/after each SBRC/ORI, though, for the one with `mask_down` and `up` https://gcc.godbolt.org/z/rn7g2o. That might be a missed optimization; I haven't grokked what it's doing. The ideal would be `mov r18, r24` then 8x `lsr r18` / `rol r24` / `ret` to implement the function. Thanks for pointing out `swap`; makes sense to have an instruction for that. — Peter Cordes, Apr 15 '20 at 00:49

score 2 · Answer 2 · edited Oct 05 '20 at 10:13

2

AVR has swap instruction which allows to swap 4bit nibbles in a register. You can use it to make the code shorter.

mov r17, r16
ror r16  // rotate right, pushing bit 0 out to C flag
rol r17  // rotate left, moving C flag into bit 0 and pushing bit 7 out to C flag
ror r16
rol r17
ror r16
rol r17
ror r16
rol r17
ror r16
andi r16, 0xF0 // isolating bit 4 to 7
andi r17, 0x0F // isolating bit 0 to 3
or r16, r17    // combining
swap r16       // swapping the nibbles

edited Oct 05 '20 at 10:13

doom

3,276
4
30
41

answered Aug 26 '20 at 21:06

AterLux

4,566
2
10
13

Harder to understand, but seems to work. Win 3 cycles also ( 14 instead of 17). – doom Oct 05 '20 at 16:14

score 1 · Answer 3 · answered Aug 26 '20 at 14:36

You can do that using rol and ror opcodes (rotates left and right through carry).

ldi  r16,0b10101000   // value to reverse
push r17
rol  r16        // this push r16 msb bit in carry in left direction
ror  r17        // this push carry to r17 in right direction
rol  r16
ror  r17
rol  r16
ror  r17
rol  r16
ror  r17
rol  r16
ror  r17
rol  r16
ror  r17
rol  r16
ror  r17
rol  r16
ror  r17       // now r17 is reversed value
mov  r16,r17   // now r16 is reversed value
pop  r17

** not tested, but should work ;-) **

You can also improve this with a loop !

Reversing bits in a byte with AVR

3 Answers3

Linked