15

I have to split a number into its digits in order to display it on an LCD. Right now I use the following method:

pos = 7;

do
{
    LCD_Display(pos, val % 10);
    val /= 10;
    pos--;
} while (pos >= 0 && val);

The problem with this method is that division and modulo operations are extremely slow on an MSP430 microcontroller. Is there any alternative to this method, something that either does not involve division or that reduces the number of operations?

A note: I can't use any library functions, such as itoa. The libraries are big and the functions themselves are rather resource hungry (both in terms of number of cycles, and RAM usage).

alex
  • 3,710
  • 32
  • 44
  • This might seem like a micro-optimization, but the sample code I wrote takes roughly 2.5ms, which is ages if you're running on battery power. – alex Feb 10 '12 at 08:25
  • 3
    Some compilers are able to optimize the division and modulus into a multiplication and shift. What compiler and what optimizations are you using? – Basile Starynkevitch Feb 10 '12 at 08:28
  • Basile: exactly. If you use some recent gcc with optimization turned on (something like -O2 flag), this should be optimized. – dbrank0 Feb 10 '12 at 08:31
  • I'm using IAR with optimizations set to low right now (although it has no influence on this part of the code). Division takes a really long time, especially when val is an uint32_t. It takes around 450 cycles for each operation. – alex Feb 10 '12 at 08:32
  • Also see this: http://stackoverflow.com/questions/5558492/divide-by-10-using-bit-shifts – dbrank0 Feb 10 '12 at 08:36
  • And this: http://stackoverflow.com/questions/2033210/c-fast-division-mod-by-10x – dbrank0 Feb 10 '12 at 08:38
  • @dbrank0 Multiplication isn't incredibly efficient either, although it's still better than division. Thanks for the links! – alex Feb 10 '12 at 08:40
  • Google for binary to BCD conversion. There are some tricky ways of doing it. Atmel has an app note for particular conversions, optimized up the wazoo. – Mark Lakata Nov 27 '12 at 22:12

5 Answers5

13

You could do subtractions in a loop with predefined base 10 values.

My C is a bit rusty, but something like this:

int num[] = { 10000000,1000000,100000,10000,1000,100,10,1 };

for (pos = 0; pos < 8; pos++) {
  int cnt = 0;
  while (val >= num[pos]) {
    cnt++;
    val -= num[pos];
  }
  LCD_Display(pos, cnt);
}
pmg
  • 106,608
  • 13
  • 126
  • 198
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • Interesting idea. I'll look into it, see if there's a difference in execution time. Thanks! – alex Feb 10 '12 at 08:33
  • And it works. Execution time has gone down from ~2.1ms to less than 700us. – alex Feb 10 '12 at 09:37
  • 2
    You should declare num as const, as it is a constant lookup table. Beside the benefits of "const-correctness", it may also make the code more efficient on some platforms. – Lundin Feb 10 '12 at 10:26
  • @Lundin definitely! The num array should be a `static const` in the file where it's used. – alex Feb 10 '12 at 10:43
  • @alex Using the static keyword together with const is a practice that can be debated. Since it makes most sense to declare const table variables at file scope, they will get static storage duration by default. Also, it all depends on whether the const should be accessible from several files or if it should only be used by the current file. const variables is one of the few cases in C where it is ok to have a global variable, since no external caller can change its contents to achieve spaghetti code. – Lundin Feb 10 '12 at 12:36
  • Nice to see that it works. :) You can also try copying the value from the array to a local variable outside the `while` loop. It might shave off some microseconds more. – Guffa Feb 10 '12 at 12:43
  • Thanks! I've been needing something like this! – Kenneth Feb 10 '12 at 14:38
  • @Kenneth I'm somewhat surprized that no one has asked this before. I was certain that it was something that people have tried to optimize before. – alex Feb 10 '12 at 15:14
6

Yes, there's another way, originally invented (at least AFAIK) by Terje Mathiesen. Instead of dividing by 10, you (sort of) multiply by the reciprocal. The trick, of course, is that in integers you can't represent the reciprocal directly. To make up for that, you work with scaled integers. If we had floating point, we could extract digits with something like:

input = 123

first digit = integer(10 * (fraction(input * .1))
second digit = integer(100 * (fraction(input * .01))

...and so on for as many digits as needed. To do this with integers, we basically just scale those by 232 (and round each up, since we'll use truncating math). In C, the algorithm looks like this:

#include <stdio.h>

// here are our scaled factors
static const unsigned long long factors[] = { 
    3435973837,  // ceil((0.1 * 2**32)<<3)
    2748779070,  // ceil((0.01 * 2**32)<<6)
    2199023256,  // etc.
    3518437209,
    2814749768,
    2251799814,
    3602879702,
    2882303762,
    2305843010
};

static const char shifts[] = {
    3, // the shift value used for each factor above
    6,
    9,
    13,
    16,
    19,
    23,
    26,
    29
};

int main() { 
    unsigned input = 13754;

    for (int i=8; i!=-1; i--) {
        unsigned long long inter = input * factors[i];
        inter >>= shifts[i];
        inter &= (unsigned)-1;
        inter *= 10;
        inter >>= 32;
        printf("%u", inter);
    }
    return 0;
}

The operations in the loop will map directly to instructions on most 32-bit processors. Your typical multiply instruction will take 2 32-bit inputs, and produce a 64-bit result, which is exactly what we need here. It'll typically be quite a bit faster than a division instruction as well. In a typical case, some of the operations will (or at least with some care, can) disappear in assembly language. For example, where I've done the inter &= (unsigned)-1;, in assembly language you'll normally be able to just use the lower 32-bit register where the result was stored, and just ignore whatever holds the upper 32 bits. Likewise, the inter >>= 32; just means we use the value in the upper 32-bit register, and ignore the lower 32-bit register.

For example, in x86 assembly language, this comes out something like:

    mov ebx, 9 ; maximum digits we can deal with.
    mov esi, offset output_buffer
next_digit:
    mov eax, input
    mul factors[ebx*4]
    mov cl, shifts[ebx]
    shrd eax, edx, cl
    mov edx, 10 ; overwrite edx => inter &= (unsigned)-1
    mul edx 
    add dl, '0'
    mov [esi], dl ; effectively shift right 32 bits by ignoring 32 LSBs in eax
    inc esi
    dec ebx
    jnz next_digit
    mov [esi], bl ; zero terminate the string

For the moment, I've cheated a tiny bit, and written the code assuming an extra item at the beginning of each table (factors and shifts). This isn't strictly necessary, but simplifies the code at the cost of wasting 8 bytes of data. It's pretty easy to do away with that too, but I haven't bothered for the moment.

In any case, doing away with the division makes this a fair amount faster on quite a few low- to mid-range processors that lack dedicated division hardware.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • It works, although I didn't get a chance to try it out on the actual development board, to see how it performs. Thank you! – alex Feb 11 '12 at 11:42
  • 1
    @alex: Doing a bit of looking, I'm less certain about how well it'll work in this case. The MSP 430 apparently lacks a multiply instruction, which is likely to hurt this somewhat. – Jerry Coffin Feb 11 '12 at 18:00
1

Another way is using double dabble. This is a way to convert binary to BCD with only additions and bit shifts so it's very appropriate for microcontrollers. After splitting to BCDs you can easily print out each number

phuclv
  • 37,963
  • 15
  • 156
  • 475
0

I would use a temporary string, like:

char buffer[8];
itoa(yourValue, buffer, 10);
int pos;

for(pos=0; pos<8; ++pos)
    LCD_Display(pos, buffer[pos]); /* maybe you'll need a cast here */

edit: since you can't use library's itoa, then I think your solution is already the best, providing you compile with max optimization turned on.

You may take a look at this: Most optimized way to calculate modulus in C

Community
  • 1
  • 1
vulkanino
  • 9,074
  • 7
  • 44
  • 71
  • I can't use any library functions, they take up a lot of code space and are usually incredibly resource hungry. – alex Feb 10 '12 at 08:34
0

This is my attempt at a complete solution. Credit should go to Guffa for providing the general idea. This should work for 32bit integers, signed or otherwise and 0.

#include <stdlib.h>
#include <stdio.h>

#define MAX_WIDTH (10)

static unsigned int uiPosition[] = {
  1u,
  10u,
  100u,
  1000u,
  10000u,
  100000u,
  1000000u,
  10000000u,
  100000000u,
  1000000000u,
};

void uitostr(unsigned int uiSource, char* cTarget)
{
  int i, c=0;

  for( i=0; i!=MAX_WIDTH; ++i )
  {
    cTarget[i] = 0;
  }

  if( uiSource == 0 )
  {
    cTarget[0] = '0';
    cTarget[1] = '\0';
    return;
  }

  for( i=MAX_WIDTH -1; i>=0; --i )
  {
    while( uiSource >= uiPosition[i] )
    {
      cTarget[c] += 1;
      uiSource -= uiPosition[i];
    }

    if( c != 0 || cTarget[c] != 0 )
    {
      cTarget[c] += 0x30;
      c++;
    }
  }

  cTarget[c] = '\0';
}

void itostr(int iSource, char* cTarget)
{
  if( iSource < 0 )
  {
    cTarget[0] = '-';
    uitostr((unsigned int)(iSource * -1), cTarget + 1);
  }
  else
  {
    uitostr((unsigned int)iSource, cTarget);
  }
}

int main()
{
  char szStr[MAX_WIDTH +1] = { 0 };

  // signed integer
  printf("Signed integer\n");

  printf("int: %d\n", 100);
  itostr(100, szStr);
  printf("str: %s\n", szStr);

  printf("int: %d\n", -1);
  itostr(-1, szStr);
  printf("str: %s\n", szStr);

  printf("int: %d\n", 1000000000);
  itostr(1000000000, szStr);
  printf("str: %s\n", szStr);

  printf("int: %d\n", 0);
  itostr(0, szStr);
  printf("str: %s\n", szStr);

  return 0;
}
Kenneth
  • 1,167
  • 1
  • 13
  • 26