9

In the msgpcc (GCC for MSP430 microcontrollers) manual authors wrote:

Use int instead of char or unsigned char if you want a small integer within a function. The code produced will be more efficient, and in most cases storage isn't actually wasted.

Why int is more efficient?

UPD. And why (u)int_fast8_t in the mspgcc defined to (unsigned) char, not (unsigned) int. As I understand, (u)int_fast*_t should be defined to the most efficient type with a suffient size.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 1
    afaik MSP430 has no difference in instruction latency between 8-bit and 16-bit operands. I'm curious too. – Cory Nelson Nov 25 '13 at 15:50
  • 1
    I would just like to add that the wonderful thing I love about `(u)int_(fast|least)(8|16|32)_t` is that you can specify what sort of optimization you want and generally let the compiler handle it. If you want to use as little space as possible and you need to represent up to 20,000, use `(u)int_least16_t`, but if you are using something for heavy calculation that you want fast you can use `(u)int_fast16_t` largely without having to worry about what type it ends up using under the hood. – rjp Jan 09 '14 at 19:09

4 Answers4

5

A general rule of thumb is that CPUs are fastest at operating on integers of their native word size.

This is of course entirely architecture dependent, see the answers to this similar question for more clarification on that point.

Community
  • 1
  • 1
Juser1167589
  • 415
  • 7
  • 16
5

TI has published an Application Note on the topic for their Tiva-C (formally Stellaris) MCUs.

In the "Introduction" section, a table provides a list of factors affecting performance and size. A factor label Variable size states that "using variables smaller than optimal may mean extra instructions to sign or unsign extend...".

Also, under the section, "Size of Variables", it states:

"When the local variables are smaller than the register size, then extra code is usually needed. On a Stellaris part, this means that local variables of size byte and halfword (char and short int respectively) require extra code. Since code ported from an 8-bit or 16-bit microcontroller may have had locals converted to smaller sizes (to avoid the too large problem), this means that such code will run slower and take more code space than is needed."

Please see: http://www.ti.com/lit/an/spma014/spma014.pdf

The following is handled by the compiler, but is still relevant to the issue at hand:

The MSP430 is a 16-bit microprocessor. A char is only 8-bits and would require packing to ensure that all words are aligned. For instance, 3 chars would not align properly in memory. Instead, use an integer that is 16-bits and will always be aligned.

When you use variable sizes that are multiples of 16 (e.g. 16 and 32) you can also utilize memory more efficiently. You won't end up with padding to align the memory.

bblincoe
  • 2,393
  • 2
  • 20
  • 34
  • 1
    The compiler's job is to ensure no alignment issues, and it won't waste any more space than you would by using a 16-bit variable where an 8-bit one would have worked. You've essentially moved the padding from an implicit process to an explicit one. – Cory Nelson Nov 25 '13 at 15:51
  • @CoryNelson I agree - it is compiler-related and not run-time. However, you're incorrect with your statement that it doesn't take more memory. Have you tried compiling with both GCC and IAR? They both pad differently and you end up with radically different code size if you aren't careful (disregarding optimization). This seems to be due to padding. – bblincoe Nov 25 '13 at 18:21
4

In general, not necessarily specific to this processor, it has to do with sign extension and masking, requiring additional instructions to faithfully implement the C source code. A signed 8 bit value in a 16 or 32 or 64 bit processor MAY involve additional instructions to sign extend. An 8 bit add on a 32 bit processor might involve extra instructions to and with 0xFF, etc.

You should do some simple experiments, it took a few iterations but I quickly hit something that showed a difference.

unsigned int fun ( unsigned int a, unsigned int b )
{
    return(a+b)<<3;
}

unsigned char bfun ( unsigned char a, unsigned char b )
{
    return(a+b)<<3;
}


 int sfun (  int a,  int b )
{
    return(a+b)<<3;
}

 char sbfun (  char a,  char b )
{
    return(a+b)<<3;
}

produces

00000000 <fun>:
   0:   0f 5e           add r14,    r15 
   2:   0f 5f           rla r15     
   4:   0f 5f           rla r15     
   6:   0f 5f           rla r15     
   8:   30 41           ret         

0000000a <bfun>:
   a:   4f 5e           add.b   r14,    r15 
   c:   4f 5f           rla.b   r15     
   e:   4f 5f           rla.b   r15     
  10:   4f 5f           rla.b   r15     
  12:   30 41           ret         

00000014 <sfun>:
  14:   0f 5e           add r14,    r15 
  16:   0f 5f           rla r15     
  18:   0f 5f           rla r15     
  1a:   0f 5f           rla r15     
  1c:   30 41           ret         

0000001e <sbfun>:
  1e:   8f 11           sxt r15     
  20:   8e 11           sxt r14     
  22:   0f 5e           add r14,    r15 
  24:   0f 5f           rla r15     
  26:   0f 5f           rla r15     
  28:   0f 5f           rla r15     
  2a:   4f 4f           mov.b   r15,    r15 
  2c:   30 41           ret         

The msp430 has word and byte versions of the instructions so a simple add or subtract doesnt have to do the clipping or sign extension that you would expect when using smaller than register sized variables. As a programmer we might know that we were only going to feed sbfun some very small numbers, but the compiler doesnt and has to faithfully implement our code as written, generating more code between sfun and sbfun. It is not hard to do these experiements with different compilers and processors to see this in action, the only trick is to create code that the processor doesnt have simple instructions to solve.

another example

unsigned int fun ( unsigned int a, unsigned int b )
{
    return(a+b)>>1;
}

unsigned char bfun ( unsigned char a, unsigned char b )
{
    return(a+b)>>1;
}

produces

00000000 <fun>:
   0:   0f 5e           add r14,    r15 
   2:   12 c3           clrc            
   4:   0f 10           rrc r15     
   6:   30 41           ret         

00000008 <bfun>:
   8:   4f 4f           mov.b   r15,    r15 
   a:   4e 4e           mov.b   r14,    r14 
   c:   0f 5e           add r14,    r15 
   e:   0f 11           rra r15     
  10:   4f 4f           mov.b   r15,    r15 
  12:   30 41           ret         
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • +1, although it's worth noting that using larger types for parameters may lead to increased stack usage, which I suppose might be an issue when resources are constrained. – vgru Nov 26 '13 at 09:36
  • Yep, it is all a performance and optimization game...Wanted to plant some tiny seeds about why smaller isnt necessarily better (relative term). – old_timer Nov 26 '13 at 14:46
1

int matches the native size of the processor in question (16 bits), so when you ask for a store to an unsigned char variable, the compiler may have to emit extra code to ensure that the value is between 0 and 255.

Martin Thompson
  • 16,395
  • 1
  • 38
  • 56
  • 1
    I think you are wrong. The compiler won't generate a code to wraps around from 255 to 0. MSP430 has byte-oriented instructions and the compiler just will use the `add.b` instruction instead of `add`. –  Nov 25 '13 at 16:17
  • @Corvus That is correct. The MSP430 has 27 instructions with most instructions available in .B (8-bit) and .W (16-bit) suffixed versions. – bblincoe Nov 25 '13 at 16:23
  • OK, good point in this case, not so on other processors. It's been too long since I was MSP430ing! I'm sure there are other cases (than just a simple increment) where an extra AND would need to be issued though... – Martin Thompson Nov 25 '13 at 17:03