1

I want to convert unsigned numbers to an ascii string in 68k asm. I can do this for something in the range of words:

move.w #12345,d0    ;example number to convert (unsigned word)
lea   text,a0       ;adress of output string 
;first digit    
and.l #$ffff,d0     ;mask lower word    
divu  #10000,d0     ;calc 10k digit -> d0: remainder.result
add.w #$30,d0       ;convert d0 lowerword to ascii
move.b d0,(a0)+     ;write ascii byte to mem
;next digit
swap  d0            ;continue with remainder
and.l #$ffff,d0
divu  #1000,d0      ;calc 1k digit
add.w #$30,d0
move.b d0,(a0)+
;...and so on

The code above works. It is unrolled and probably could be packed into a loop. However, what I'd really like to do, is convert unsigned longs to their ascii representation. But, of course, divu will only allow a 16-bit word as divisor, so div by #10000 is possible, but div by #100000 is not. How can the conversion be done for the unsigned long 32bit number range?

Steve_I
  • 51
  • 7
  • Why not do repeated divisions by 10 instead? You'll have to construct the string backwards, but that doesn't seem like a big issue. I.e. 123456 / 10 == 12345 with remainder 6. 12345 / 10 == 1234 with remainder 5. etc. – Michael Mar 03 '23 at 18:15
  • 1
    even like that, you can't directly divide a big 32 bit longword by 10, it overflows because DIVU yields its results in the 16 lower bits (16 upper bits are reserved for remainder) – Jean-François Fabre Mar 04 '23 at 17:32
  • I think you'd be better off having your numbers stored as binary-coded decimal, they're much easier to convert to ASCII and back since the same algorithms used for hexadecimal to ASCII will work. – puppydrum64 Jun 29 '23 at 12:01
  • You'r correct where that's possible @puppydrum64. But sometimes you just have an unsigned long. In my case that value is the result of an algorithm. – Steve_I Jun 30 '23 at 13:06

2 Answers2

2

I've written a long to ascii integer converter for an arcade remake (Pacman on Commodore Amiga).

Actually, you can use DIVU up to (1<<16*10)-1 as the result of 655359/10 still fits in 16 bits (and is 65535)

To go further, just divide a copy of your long number by 10000, it cannot be > 655359 either as 6553590000 > 1<<32 so the result will fit in 16 bits. Write the result as ASCII.

Then proceed with the remainder of the division you just obtained, and do the same for the lower part.

So if num = (1<<32)-1 (which is max unsigned value aka $ffffffff), dividing num by 10000 is 429496 which is < 65536*10-1 so can be processed. And the remainder can also be processed the same way.

Note: actually it works up only to 655360000-1 ($2710.0000-1) because that's the last number where the stage one divu #10000 will result in <65536, for higher numbers, a third division stage by 1 million must be performed.

Here's some code I adapted from my game & tested. Zero padding seems a bit off but you probably don't need it.

; main test code
main:
    move.l  #$12345678,d0
    lea     outbuf(pc),a1
    move.w  #0,d1
    bsr     write_decimal_number
    ; A1 contains "305419896"
    move.l  #2000001,d0
    lea     outbuf(pc),a1
    move.w  #0,d1
    bsr     write_decimal_number
    ; A1 contains "2000001"

    rts
    
outbuf
    ds.b    100,0
    
; the conversion routine
; < A1: buffer to write into
; < D0: 32 bit unsigned number
; < D1: number of padding zeroes
    
write_decimal_number:
    movem.l A0-A1/D2-d5,-(a7)
    move.l  d1,d3   ; quickly adapted, old code needed D0 and D1 for coordinates
    move.l  d0,d2
    cmp.w   #18,d3
    bcs.b   .padok
    move.w  #18,d3
.padok
    cmp.l   #655360,d2
    bcs.b   .one
    sub.l   #4,d3
    move.w  d0,d5
    ; first write high part    
    divu    #10000,d2
    swap    d2
    moveq.l #0,d4
    move.w   d2,d4
    clr.w   d2
    swap    d2
    bsr     .write_num
    subq.w  #1,a1   ; cancel last zero
    move.l  d4,d2
    moveq   #4,d3   ; pad to 4
.one
    bsr     .write_num
    clr.b   (a1)
    movem.l (a7)+,A0-A1/D2-d5
    rts
.write_num
    lea .buf+20(pc),a0
    tst.w   d2
    beq.b   .zero
.loop
    divu    #10,d2
    swap    d2
    add.b   #'0',d2
    subq    #1,d3
    move.b  d2,-(a0)
    clr.w   d2
    swap    d2
    tst.w   d2          ; probably unnecessary, swap sets Z flag already
    beq.b   .write
    bra.b   .loop
.zero
    subq    #1,d3
    move.b  #'0',-(a0)
.write
    tst.b   d3
    beq.b   .w
    bmi.b   .w
    subq    #1,d3
.pad
    move.b  #'0',-(a0)
    dbf d3,.pad
.w
    move.b  (a0)+,(a1)+
    bne.b   .w
    rts
.buf
    ds.b    20
    dc.b    0
    even
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • That algorithm is nice. One div for each digit. And using a compare to skip the first div by 10000 when unnecessary.. – Erik Eidt Mar 03 '23 at 20:52
  • 1
    thanks. I'm using it to display the scores and on a 68000 you have to care for performance (those are the only "div" of the code except maybe for random function, everything else uses pre-computed tables). I also fixed a bug while writing this answer so it's win-win :) – Jean-François Fabre Mar 03 '23 at 22:39
  • 1
    @Jean-FrançoisFabre: Does 68000 have a multiplier that's faster than its divider? If so, consider a multiplicative inverse where you use the high half of a 32x32 => 64-bit multiply. (or 16x16 => 32). [Integer-to-ASCII algorithm (x86 assembly)](https://codereview.stackexchange.com/q/142842) is an x86 version of that algorithm, using magic constant `0xCCCCCCCD`. (Getting the remainder with `num - num/10*10`.) – Peter Cordes Mar 04 '23 at 01:31
  • https://gmplib.org/~tege/divcnst-pldi94.pdf reports a speedup for 32-bit division on 68020 from 76-78 cycles using a hardware division instruction to 41-44 cycles using a multiplicative inverse. But I forget if 68020 had new instructions that help, or how cycle times for div/mul differ on 68000. – Peter Cordes Mar 04 '23 at 01:35
  • 1
    @PeterCordes yes, multiplication is cheaper than division on any 680x0 cpu. I didn't care to optimize further, it was cheap enough for me in the context of a score display once in a while. Agreed, if this has to be cpu-intensive, DIV has to go. If one has enough memory, one could even consider a pre-computed table but that would eat several MBs with the involved values :) – Jean-François Fabre Mar 04 '23 at 17:30
  • Nice algorithm Jean-Francois, thank you. Actually it works up to 655.360.000-1 ($2710.0000-1) because thats the last number where the stage one divu #10000 will result in <65536 (testing confirmed this). – Steve_I Mar 13 '23 at 13:49
  • ouch! very true. I'm editing the answer, while maybe some fix can be found. Suppose that if the value is > to 655360000-1 we can divide by 1 million and do the same method. – Jean-François Fabre Mar 13 '23 at 16:03
  • Just out of curiosity, what hardware is this game running on? – puppydrum64 Jun 29 '23 at 12:02
  • Amiga500 (edited answer) – Jean-François Fabre Jun 29 '23 at 12:23
1

The algorithm you're showing extracts the digits in forward direction, e.g. starting from higher order decimal digits and going to the lower order.

The DIVU instruction divides a 32-bit dividend value by a 16-bit divisor value, producing a 16-bit quotient and 16-bit remainder.

So, the key is to use that feature.  In part, this means extracting the low order digits first.  This can be seen in a standard itoa

Instead of using ever smaller powers of 10 going leftmost to rightmost, this uses division and modulus by 10 for each digit going right to left.

Since the digits are being produced in reverse order, there are several approaches to reversing the string, such as:

  • reverse the digits after producing them (as the above example shows);
  • start at the end of the buffer and add them backwards (e.g. using -(a0) instead of (a0)+).
    • Then you can fill the remaining part of the buffer with ascii 0 (zero) digits, or,
    • do equivalent of movmem to slide it to the beginning of the buffer, or,
    • best if you can tolerate it: return the pointer within the buffer where the last digit was placed to the caller, so that they have the start of the string.

An advantage of this approach is that you can stop early, when the quotient is 0, since all remaining digits will be zero.

The problem with DIVU, of course is that large numbers will produce overflow in division by 10, since only 16-bit result is captured.  In an overflow situation, you get neither the quotient nor the remainder as the destination register remains unmodified..

So, there is a trick to this, involving two division operations.  See here for the x86 version: Trying to Display a 32bit number in Assembly 8086 32bit

The algorithm goes something like this:

#include <stdio.h>

// using simple global variable to hold the resulting string
// but of course this can be edited to be parameterized instead
static char buf[20];

char *div2(unsigned long dividend) {
    unsigned short dvUpper16 = dividend >> 16;
    unsigned short dvLower16 = (unsigned short) dividend;
    char *bufPtr = buf + sizeof(buf);
    *--bufPtr = 0;

    for (;;) {
        unsigned long 
        dvTemp32  = dvUpper16;       // zero extending upper16 to 32 bits
        dvUpper16 = dvTemp32 / 10;   // first DIVU
        unsigned short 
        remainOne = dvTemp32 % 10;   // get mod from first DIVU
        
        dvTemp32 = (remainOne << 16) + dvLower16;
        dvLower16 = dvTemp32 / 10;  // second DIVU
        unsigned short 
        remainTwo = dvTemp32 % 10;  // get mod from second DIVU

        // print or capture remainTwo
        printf("%d\n", remainTwo);
        *--bufPtr = remainTwo + '0';
        
        if ( (dvUpper16 | dvLower16) == 0 )
            break;
    }
    
    return bufPtr;
}

int main()
{
    char *ptr = div2(543217689);
    printf("%s\n", ptr);
}

This arrangement of dual divisions works with 32-bit dividend and 16-bit divisor, and will never overflow a 16-bit result.

This will translate to 68k pretty easily, as all the needed operators are there.

Since DIVU produces both quotient and remainder and both are needed together, in 68k this will only require two DIVU's in total.

Erik Eidt
  • 23,049
  • 2
  • 29
  • 53