Bit-reverse a byte on 68HC12

Question

I'm in a microprocessors class and we are using assembly language in Freescale CodeWarrior to program a 68HCS12 micro controller. Our assignment this week is to revers a byte, so if the byte was 00000001, the output would be 10000000, or 00101011 to 11010100. We have to use assembly language, and were told we could use rotates and shifts (but not limited to!) to accomplish this task. I'm really at a loss as to where I should start.

score 8 · Answer 1 · answered Feb 07 '11 at 17:11

8

Hints: If you do a shift, one bit gets shifted out and a zero (probably) gets shifted in. Where does that shifted out bit go to? You need to shift that in to the other end of the destination register or memory address.

I'm sure that 25 years ago I could do this in Z80 machine code without an assembler :)

answered Feb 07 '11 at 17:11

Spacedman

92,590
12
140
224

2

There are hackier methods: http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious (in C, but could be done in assembly...) – Spacedman Feb 07 '11 at 17:25
It's actually *easier* in assembly than C because you have rotate-through-carry. Shift a bit into the carry flag then do something like `adc same,same` to shift the carry flag into the bottom of a register. – Peter Cordes Dec 27 '19 at 09:46

score 6 · Answer 2 · answered Feb 07 '11 at 17:08

6

Consider two registers as stacks of bits. What happens if you move one bit at a time from one to another?

answered Feb 07 '11 at 17:08

bdonlan

224,562
31
268
324

score 6 · Accepted Answer · answered Feb 07 '11 at 17:19

6

If you can spare the 256 bytes extra code size, a lookup table is probably the most efficient way to reverse a byte on a 68HCS12. But I am pretty sure this is not what your instructor is expecting.

For the "normal" solution, consider the data bits individually. Rotates and shifts allow you to move bits around. For a first solution, isolate the eight bits (with bitwise "and" operations), move them to their destination positions (shifts, rotates...), then combine them together again (with bitwise "or" operations). This will not be the most efficient or simplest implementation, but you should first concentrate on getting a correct result -- optimization can wait.

answered Feb 07 '11 at 17:19

Thomas Pornin

72,986
14
147
189

Ok, I figured out how to do it using shifts and rotates. However, we can get extra credit for having the most efficient code. He doesn't care how we do it. I honestly don't know how to do a lookup table. I was kind of reading about them when researching an answer to this problem before, but I didn't really understand how to implement them. – dohlfhauldhagen Feb 07 '11 at 17:31
Suppose your program in its lifetime is going to be doing zillions of bit reversals. When you start the program you just do all 256 possibilities by bit rotates and store the results in 256 bytes of contiguous memory, starting at, say, BASE. Now, whenever you need to flip the bits of a register, you just look at (BASE+value). That's a lookup table (LUT). You can even hardcode the LUT into the assembly as a chunk of 256 constants if you can precompute them. Then there's no init needed. Win. – Spacedman Feb 07 '11 at 18:07
1

To save space over the 256 byte table you can have a 16 byte table containing the values for four bits (nibbles) at a time. The algorithm then would be "revval=revdigit[inval&0x0f]<<4|revdigit[inval>>4]". If I were a prof I'd like the two parts where one shift is in the indexing and the other outside. – Olof Forshell Feb 08 '11 at 21:55

score 4 · Answer 4 · answered Feb 07 '11 at 17:10

4

When you do a right shift, what was the least significant bit goes into the carry flag.

When you do a rotate, the carry flag is used to fill in the vacated bit of the result (LSB for a ROL, MSB for a ROR).

answered Feb 07 '11 at 17:10

Jerry Coffin

476,176
80
629
1,111

score 3 · Answer 5 · edited Nov 21 '17 at 07:07

3

For example, if you have in al the byte number the easiest way is

mov al, 10101110
mov ecx, 8

we put 8 in ecx for loop

mov ebx, 0

In bl we will havee the result, we will make ebx, only to see what happens better

loop1:
sal al, 1;

In carry flag now you have the last bit from left

rcr bl, 1;

now you add in bl what you have in carry

loop loop1

and that's all

edited Nov 21 '17 at 07:07

Rob

26,989
16
82
98

answered Nov 21 '17 at 07:03

Adi Bârsan

41
3

score 3 · Answer 6 · answered Feb 07 '11 at 17:08

FIrst of all work out the algorithm for doing what you need to do. Express it as pseudo code or C or plain English or diagrams or whatever you are comfortable with. Once you have cleared this conceptual hurdle the actual implementation should be quite simple.

Your CPU probably has instructions which let you shift and/or rotate a register, perhaps including the carry flag as an additional bit. These instructions will be very useful.

score 1 · Answer 7 · answered Feb 09 '11 at 07:45

This was a comment but I thought WTH!

To save space over the 256 byte table you can have a 16 byte table containing the values for four bits (nibbles) at a time. The algorithm then would be

revval=(revdigit[inval&0x0f]<<4)|
        revdigit[inval>>4];

If I were a prof I'd certainly like the two parts where one shift is in the indexing and the other outside.

score 0 · Answer 8 · answered Apr 03 '15 at 14:17

I had also to program this bit reverse for the university (for 8 bits). Here is how I did:

MOV AL, 10001011B ;set the value to test
MOV CL, 7
MOV DH, 1
MOV DL, 0

loop1: PUSH AX
AND AL, DH 
PUSH CX
MOV CL, DL
SHR AL, CL
POP CX
MOV BH, AL
SHL BH,CL
OR CH,BH
DEC CL
INC DL
SHL DH, 1
POP AX
CMP DL, 8
JE END
JMP LOOP1

END:

I didn't commented it so here is how it works: DH is a 1 which travels in the byte like first time: 00000001; second time 00000010and so on. When you make an AND with the AL you get 0 or something like 100or 10000 you have to shift that to the right to get it as 0 or 1. Then, put it in BH and shift to the desired position which is 7for byte 0, 6for byte 1and so on. Then OR to our final result and INC and DEC what is necessary. Do not forget the conditional jumps and to pop AX for the next loop :)

Result will be in CH.

Antonin GAVREL · Answer 9 · 2020-04-10T01:06:19.893

0

The following code make use of rotations and shift. I use Intel x86 syntax, see explanations on the right side:

    mov cx, 8           ; we will reverse the 8 bits contained in one byte
loop:                   ; while loop
    ror di              ; rotate `di` (containing value of the first argument of callee function) to the Right in a non-destructive manner
    adc ax, ax          ; shift `ax` left and add the carry, the carry is equal to 1 if one bit was rotated from 0b1 to MSB from previous operation
    dec cx              ; Decrement cx
    jnz short loop      ; Jump if cx register Not equal to Zero else end loop and return ax

I use dec instruction instead of sub, because it takes one byte only while sub takes 3 bytes. On top of that compilers seem to always optimize by choosing dec over sub.

edit: Also note that rcl ax (3 bytes), while equivalent of adc ax, 0 (2 bytes) followed by shl ax (2 bytes) is less efficient. See comments below, many thanks to Peter Cordes for his insights.

edited Apr 10 '20 at 01:06

answered Apr 09 '20 at 23:54

Antonin GAVREL

9,682
8
54
81

You don't need a slow `rcl`; you can use `adc ax, ax`. Both are 2 bytes (in 16-bit mode) and `adc` is faster on modern CPUs. Your breakdown in terms of shl eax,1 and adc al,0 is wrong: it would be `shl ax,1` and `adc ax,0`, both with the same operand-size as the `rcl`, and shifting *before* adding in the carry. Also, `dec cl` is 2 bytes; maybe you were thinking of `dec cx`? The one-byte inc/dec opcodes are only for 16 or 32-bit operand-size, not 8-bit. (If you were really optimizing for speed over size, you'd use the slow `loop` instruction. But don't do that) – Peter Cordes Apr 10 '20 at 00:05
Also you could make it non-destructive by using `ror di, 1` instead of `shr`. Oh also, you only set CX=8, but DI and AX are 16-bit registers. – Peter Cordes Apr 10 '20 at 00:09
By the way I am puzzled by your comment about slow loop, may you elaborate ? I also checked for ror vs shr https://stackoverflow.com/questions/4976636/whats-the-purpose-of-the-rotate-instructions-rol-rcl-on-x86, but what the point to make it non-destructive? – Antonin GAVREL Apr 10 '20 at 00:38
[Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](https://stackoverflow.com/q/35742570). And re: non-destructive: if you used `ror di,1` 16 times in a loop, the final value of DI would be the same as the initial. But your `shr` loop leaves DI=0 (if you did 16 iterations). So you have a choice of which one is more useful: a zeroed register or the original value. – Peter Cordes Apr 10 '20 at 00:47
Thank you very interesting, I edited the answer! Also in this very specific situation do you see any other point to have ror over shr since we do not really care of keeping di value? – Antonin GAVREL Apr 10 '20 at 00:56
How do you know whether someone using this code wants the original value as well? Sometimes you do want to keep the original for later use when bit-reversing. But anyway, why did you waste an instruction by using `shl` and `adc ax,0` separately instead of `adc ax,ax` like I initially recommended? Adding a number to itself is a left-shift by 1, and using ADC shifts in CF. So it's identical to RCL except for avoiding partial-flag stuff (adc writes all the flags). – Peter Cordes Apr 10 '20 at 01:00
Good point, and sorry I missed that, edited. Also about the loop instruction, it is not worthwhile to use at all it seems? – Antonin GAVREL Apr 10 '20 at 01:04
Also, you broke your explanation in the last paragraph: it's shift *then* adc so the new bit is in the bottom position. And rcl ax, 1 is a 2-byte instruction while `adc ax, 0` is 3. RCL by 1 is often less efficient than `adc ax,ax`, but on some CPUs it's break-even with separate shift and adc. See https://www.uops.info/table.html?search=rcl%20(r16&cb_lat=on&cb_tp=on&cb_uops=on&cb_ports=on&cb_CON=on&cb_HSW=on&cb_SKX=on&cb_ICL=on&cb_ZEN%2B=on&cb_ZEN2=on&cb_measurements=on&cb_base=on. On AMD those are all 1-uop instructions so separate shift-and-add are worse than a single RCL by 1. – Peter Cordes Apr 10 '20 at 01:06
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/211330/discussion-between-antonin-gavrel-and-peter-cordes). – Antonin GAVREL Apr 10 '20 at 01:07

Bit-reverse a byte on 68HC12

9 Answers9

Linked