0

I am getting an 8 bit binary number and want to represent that as a decimal number on a LCD screen.

So for example if I get 01000011 as input, which is 67 in decimal I first have to get the 8 bit ascii code for 6, then the 8 bit ascii code for 7 and send those to the LCD.

Any idea on how this could be done in AVR Assembler?

Gripen
  • 31
  • 3
  • 1
    So you want to convert an integer in a register to a decimal number? Or is your input an ASCII string of binary digits? Either way, the integer -> decimal-string is just a divide by 10 (once or twice), like in [How do I print an integer in Assembly Level Programming without printf from the c library?](https://stackoverflow.com/a/46301894) which shows C and x86 assembly. – Peter Cordes Feb 26 '22 at 10:49
  • 1
    AVR doesn't have a divide instruction, and maybe not multiply either for a fixed-point inverse, so you'd want to look into other techniques for dividing by 10 if there are any. The helper function from libgcc if you compile that for AVR probably takes a variable divisor so it'd be slow. – Peter Cordes Feb 26 '22 at 10:50
  • 2
    You could use a look-up-table with 256 entries and 3 bytes each taking 768 bytes. That should normally be inside the flash memory budget. You could also store each decimal digit within 4 bits and using 3 bytes to store 2 entries, reducing the memory requirement to 384 bytes, but needing some shifts and logical operations to decode. – Sebastian Feb 26 '22 at 11:23
  • 1
    @Sebastian: Neat idea. You could use 1-byte entries for the low 2 digits, and work out the leading digit manually as empty, 1, or 2. with simple branching. Spending a little bit of extra code size and time to save significant table size. (Probably about the same amount of extra code as unpacking a 3:2 table-packing scheme according to the entry being odd/even. Although AVR does have a `swap` instruction which, with an AND-immediate, can give you a 4-bit shift, so it's quite helpful for unpacking BCD vs. if you needed 4 separate shift instructions. https://godbolt.org/z/M55no7cTG.) – Peter Cordes Feb 26 '22 at 12:55
  • @PeterCordes One other possibility would be a 8bits x 16bits multiplication with throwing away the lower 24 bits for calculating the upper two digits. Factors in the range 6554 to 6579 should work. (Basically 65536/10) – Sebastian Feb 26 '22 at 13:45
  • @Sebastian: How would you do 8x16-bit multiplication efficiently on an AVR? In terms of performance and maybe code size, that would probably cost similar to just doing `digit = x % 10;` / `x /= 10;` https://godbolt.org/z/Ev1r14cc8 (includes the example from my previous comment because a `.` after the URL becomes part of the URL in comments, apparently, unlike answers.) – Peter Cordes Feb 26 '22 at 13:48
  • @PeterCordes It can do 8x8 in 2 cycles, so it would need 2 multiplications and one addition. – Sebastian Feb 26 '22 at 13:55
  • @Sebastian: Oh, right, some variants of AVR have HW multiply built-in, like megaAVR (http://ww1.microchip.com/downloads/en/Appnotes/Atmel-1631-Using-the-AVR-Hardware-Multiplier_ApplicationNote_AVR201.pdf). Baseline AVR doesn't have that. Yeah, if you have HW multiply, a multiplicative inverse becomes much more attractive, but buying a more expensive microcontroller is probably unnecessary vs. spending a bit more code and/or data size on a tiny AVR. – Peter Cordes Feb 26 '22 at 14:01
  • @Sebastian: But if you have it, then yeah a 16-bit constant would let you use those multiplicative inverse hacks that *aren't* exact over the full range (but would be over the 0..255 part of the full 16-bit range), to avoid shifting, as in [Divide by 10 using bit shifts?](https://stackoverflow.com/a/5558614) I think that's what you were saying. You'd discard the low *16* bits, though, keeping the top 8. – Peter Cordes Feb 26 '22 at 14:02
  • @PeterCordes How about calculating the first digit by comparison and branching and the second by multiplying [0; 99] with 102 and shifting the upper byte right by two. Or multiply [0; 255] with 205 and rightshift the upper byte by 3. This trick increases accuracy so that 8x8 bit multiplication is enough. – Sebastian Feb 26 '22 at 17:23
  • @Sebastian: xiver77's answer uses `(uint16_t)x * 205 >> 11`; that's what GCC uses on x86-64, so they ported it back to C and compiled for AVR (without enabling megaAVR extensions like multiply). If they did, GCC should use it. – Peter Cordes Feb 27 '22 at 01:33
  • I've added some explanation if you'd be interested. – xiver77 Feb 27 '22 at 15:04

1 Answers1

2

This is an algorithm from @Sebastian which computes the quotient after dividing by 10, correctly in the range of [0, 99].

typedef unsigned char byte;

byte div10(byte x) {
    x >>= 1;
    return (byte)((byte)(x * 3) + (x >> 2)) >> 4;
}

The casts to byte is necessary because the C standard requires to promote any intermediate result to an int which is at least 16-bit, and such conversion results to inefficient code for 8-bit processors like AVR.

GCC translates to this.

div10:
        mov r18,r24
        lsr r18
        mov r25,r24
        andi r25,lo8(-2)
        add r25,r18
        lsr r24
        lsr r24
        lsr r24
        add r24,r25
        swap r24
        andi r24,lo8(15)
        ret

You can always calculate the remainder by dividend - quotient * 10.


Here is an explanation on how the above method works based on @Sebastian's comments. Read the comments if you'd like a mathematically sophisticated explanation, but this is what I can vaguely grasp with some basic math.

Basically, you can divide a number by multiplying the divisor's inverse. n / 3 = n * 0.33...

To calculate the integer quotient of n / 3, you can use these expressions, derived from 1 / 3 = 0.33...

n * (3 + 1) / 10^1 ; n <= 4
n * (33 + 1) / 10^2 ; n <= 49
n * (333 + 1) / 10^3 ; n <= 499
n * (3333 + 1) / 10^4 ; n <= 4999
...

With a larger multiplier, you get higher precision, so the result will be accurate for a bigger dividend.

Same with binary numbers. You can calculate the integer quotient of n / 5 by these expressions, derived from the binary point expression of 1 / 5 = 0.001100110011..(2).

n * (11(2) + 1) / 2^4 ; n <= 3
n * (110(2) + 1) / 2^5 ; n <= 13
n * (1100(2) + 1) / 2^6 ; n <= 63
n * (11001(2) + 1) / 2^7 ; n <= 63
n * (110011(2) + 1) / 2^8 ; n <= 63
n * (1100110(2) + 1) / 2^9 ; n <= 173
n * (11001100(2) + 1) / 2^10 ; n <= 1023

The required size of the multiplier for a certain precision looks kind of irregular, and I still haven't figured out how it works, but for our purpose, we need to divide a number N in [0, 99] by 10, which is N / 2 / 5. N / 2 <= 49, so n * (1100(2) + 1) / 2^6 which works up to n <= 63 suffices.

We can thus transform N / 10 to N / 2 * 13 >> 6. Let h = N / 2. h * 13 overflows in 8 bits, but since >> 6 will discard some of the lower bits after the multiplication, it's okay to do some shifts beforehand.

h * 13 >> 6
= h * 12 + h >> 6
= h * 6 + (h >> 1) >> 5
= h * 3 + (h >> 2) >> 4

Since h <= 49, h * 3 + (h >> 2) fits in 8 bits, and this is represented in the C code that we've seen before.

byte div10(byte x) {
    x >>= 1;
    return (byte)((byte)(x * 3) + (x >> 2)) >> 4;
}

GCC thinks a different way of calculation is better. The assembly output of GCC can be rewritten in C as follows.

byte div10(byte x) {
    return (byte)((x & 0b11111110) + (x >> 1) + (x >> 3)) >> 4;
}

/*
div10:
        mov r18,r24
        lsr r18
        mov r25,r24
        andi r25,lo8(-2)
        add r25,r18
        lsr r24
        lsr r24
        lsr r24
        add r24,r25
        swap r24
        andi r24,lo8(15)
        ret
*/

old answer

If you are looking for an algorithm to calculate the quotient and remainder when an 8-bit number is divided by 10, in AVR assembler, this code does the trick.

But don't ask me how it works. It is the optimized output of AVR GCC translating a C function that I wrote by reverse-engineering the optimized output of x86 Clang. So I basically stole the work of two compilers.

From this C code.

#include <stdint.h>

typedef uint8_t byte;

typedef struct {
    byte _0;
    byte _1;
} bytepair;

bytepair divmod10(byte x) {
    return (bytepair){x / 10, x % 10};
}

x86 Clang produced this.

imul    ecx, edi, 205
shr     ecx, 11
lea     eax, [rcx + rcx]
lea     eax, [rax + 4*rax]
sub     dil, al
movzx   eax, dil
shl     eax, 8
or      eax, ecx
ret

which I translated to C.

bytepair divmod10(byte x) {
    byte y = (uint16_t)x * 205 >> 11;
    return (bytepair){y, x - y * 10};
}

which then I put into AVR GCC.

mov r20,r24
ldi r25,0
mov r19,r25
mov r18,r24
lsl r18
rol r19
lsl r18
rol r19
add r18,r24
adc r19,r25
lsl r18
rol r19
lsl r18
rol r19
lsl r18
rol r19
add r18,r24
adc r19,r25
mov r25,r19
mov r24,r18
lsl r24
rol r25
lsl r24
rol r25
add r18,r24
adc r19,r25
mov r24,r19
lsr r24
lsr r24
lsr r24
mov r25,r24
swap r25
lsl r25
andi r25,lo8(-32)
sub r25,r24
lsl r25
lsl r25
sub r25,r24
lsl r25
add r25,r20
ret

It seems AVR is a very simple 8-bit machine without even variable shifts. Well, still it will do the job probably faster than GCC's software division built-in.


xiver77
  • 2,162
  • 1
  • 2
  • 12
  • 1
    AVR is an 8-bit RISC microcontroller with 32 register; yes it's simple, but an interesting ISA. (The registers are memory mapped, into the low 32 bytes of memory address space. Which is normally part of on-chip SRAM of 128 bytes or so.) https://en.wikipedia.org/wiki/AVR_microcontrollers#Device_architecture – Peter Cordes Feb 26 '22 at 13:35
  • 2
    10 is divisible by 2. We can throw away the lowest bit in the beginning. Would `(x >> 1) * 103 >> 9` be a bit faster/shorter? And for numbers between 0 and 99 a factor of `(x >> 1) * 13 >> 6` would be enough. It could be rewritten as `byte a = x >> 1; byte b = a * 3; byte c = a >> 2; byte d = b + c; return d >> 4;`, which should keep the calculations to 8 bit (instead of 16 bit). The divmod10 could be done in about 12 instructions (+1 for ret). – Sebastian Feb 27 '22 at 05:03
  • @Sebastian I updated the answer with your algorithm. Could you explain a bit how you came up with such optimization for the range `[0, 99]`? – xiver77 Feb 27 '22 at 07:13
  • 2
    First you reduce the range of the values by doing binary shifts, if the divisor has factors of 2. Dividing by one factor of the divisor (2) first and the other (5) later is compatible with integer division. That is the `x >>= 1` part. The remaining range is `2 * x ∈ [0, 98]` with only the even numbers. Now we need a factor ≥ 0.1 (e.g. to get 10 to a 1 and 90 to a 9), but 98 be a 9 (98 is the most difficult number as it is the largest, which should still stay below the next integer). The factor must be < 10/98 = .10204... – Sebastian Feb 27 '22 at 09:28
  • 1
    Now we look for the smallest potency of 2 called `p = 2^i`, where `⌊0.1 * p⌋ < ⌊0.10204 * p⌋ < 0.10204 * p`. The `⌊ ⌋` sign takes the integer part (floor function). So we try out and find `p = 128`, in this case `⌊12.8⌋ = 12 < 13 = ⌊13.3...⌋ = ⌊0.10204 * 128⌋`. Actually we are looking for the whole number `d` with `12.8 ≤ d < 13.3...`. `d =13`. The factor is `13/128`. We have divided x by 2 once in the beginning. So `((x >> 1) * 13) >> 6` is the intermediate result. – Sebastian Feb 27 '22 at 09:52
  • 1
    To save operations on the AVR every calculation should fit into a byte. Not sure how smart the compiler/optomizer is (at least it does not know the range of the input parameter, have to wait for C++ contracts, or would an assert work with gcc?). `n * 13` is `n * 8 + n * 4 + n`. As we shift the result to the right the lowest bits of the smallest summand are ignored anyway. We get `n * 2 + n + n / 4` by applying 2 shifts preliminarily. For a maximum of 48 this expression is 156, which fits nicely into a byte. – Sebastian Feb 27 '22 at 10:05
  • 1
    Afterwards we shift right by 4 (from the 6 right shifts of the intermediate result, we have done 2 in the previous step). The resulting formula actually works for [0..127]. For doing div and mod it is advised to put them into the same function to reuse intermediate results. The ATMega supports hardware multiplication (13 would not have to be split). – Sebastian Feb 27 '22 at 10:09
  • 1
    Correction of the previous to last comment: Maximum is `98/2 = 49`, which results in 159. – Sebastian Feb 27 '22 at 10:13
  • @Sebastian It's taking me a while to understand your explanation. I'll update my answer when I do understand. – xiver77 Feb 27 '22 at 12:22
  • Okay, some pointers, without the first step, we would have a stricter boundary of `10/99 = 0.101010...` AND larger values more difficult to put into a byte. For the second step try to multiply a few numbers with a factor of `13/128` or e.g. `1/8`, `3/32`, `7/64`, especially for numbers, where the result should increase, e.g. from 19 to 20 or from 89 to 90 or the maximum value 99. The fractional part will be removed, so just look at the integral part of the result. – Sebastian Feb 27 '22 at 12:30
  • The denominator should be a power of 2, so the division can be made with rightshifts. For the third step think how an addition of three binary numbers (arranged vertically) works. When the result is right-shifted the rightmost two bits are removed, which just come from the smallest summand. Now all summands and the sum itself can be done inside a byte, when we rightshift every summand by two (the two larger summands are left-shifted anyway, the smallest can loose the two bits). I hope that helps for understanding, otherwise ask. – Sebastian Feb 27 '22 at 12:35
  • 1
    @Sebastian The current post is as far as I can get. Do you think it's alright? – xiver77 Feb 27 '22 at 14:47
  • Looks good. Especially the +1 for the divisor chains. If a 0 is concatenated at the right, n is not increased. The simple increase of n for 1/3 was because of the easy decimal period of size 1. Dropping the bits for the addition only works for the rightmost summand, so the second-least must not be shifted to the right, otherwise those dropped bits could add up and overflow to the left digits. The AND with -2 of gcc is a combination of shift right and shift left again (as needed for the *3). One difficulty of AVR is that you cannot specify another target for shifts. You need mov to reuse values – Sebastian Feb 27 '22 at 15:32
  • And the [0; 127] interval, which is possible in the end, comes from 63 as max. n as you calculated: `127 / 2 = 63`. Every number above 127 gives larger n. Thank you for writing up the explanation in a nice way! – Sebastian Feb 27 '22 at 15:38
  • @Sebastian I opened a [code golf challenge](https://codegolf.stackexchange.com/questions/243840/find-the-magic-numbers-to-divide-a-number-without-division) related to this that *might* be interesting to you. It is disguised as a challenge but actually a question, so that I can get a hint by looking at someone's solution. – xiver77 Mar 09 '22 at 12:27