4

I'm developing a software on 8051 processor. A frequent job is to divide the high and low byte of a 16bit address. I want to see there are how many ways to achieve it. The ways I come up so far are: (say ptr is a 16bit pointer, and int is 16bit int) [note the rn and arn is registers]

bitwise operation

ADDH = (unsigned int) ptr >> 8;
ADDL = (unsigned int) ptr & 0x00FF;

SDCC gives the following assembly code


;   t.c:32: ADDH = (unsigned int) ptr >> 8;
    mov ar6,r3
    mov ar7,r4
    mov _main_ADDH_1_1,r7
;   t.c:33: ADDL = (unsigned int) ptr & 0x00FF;
    mov _main_ADDL_1_1,r6
Keil C51 gives me:

                                           ; SOURCE LINE # 32
0045 AA00        R     MOV     R2,ptr+01H
0047 A900        R     MOV     R1,ptr+02H
0049 AE02              MOV     R6,AR2
004B EE                MOV     A,R6
004C F500        R     MOV     ADDH,A
                                           ; SOURCE LINE # 33
004E AF01              MOV     R7,AR1
0050 EF                MOV     A,R7
0051 F500        R     MOV     ADDL,A
which has many useless code IMHO.

pointer trick


ADDH = ((unsigned char *)&ptr)[0];
ADDL = ((unsigned char *)&ptr)[1];
SDCC gives me:

;   t.c:37: ADDH = ((unsigned char *)&ptr)[0];
    mov _main_ADDH_1_1,_main_ptr_1_1
;   t.c:38: ADDL = ((unsigned char *)&ptr)[1];
    mov _main_ADDL_1_1,(_main_ptr_1_1 + 0x0001)
Keil C51 gives me:

                                           ; SOURCE LINE # 37
006A 850000      R     MOV     ADDH,ptr
                                           ; SOURCE LINE # 38
006D 850000      R     MOV     ADDL,ptr+01H
which is the same with SDCC version.

Andrey's mathematic approach


 ADDH = ptr / 256;
 ADDL = ptr % 256;

SDCC gives:


;   t.c:42: ADDH = (unsigned int)ptr / 256;
    mov ar5,r3
    mov ar6,r4
    mov ar7,r6
    mov _main_ADDH_1_1,r7
;   t.c:43: ADDL = (unsigned int)ptr % 256;
    mov _main_ADDL_1_1,r5
I've no idea why sdcc use the r7 register... Keil C51 gives me:

                                           ; SOURCE LINE # 42
0079 AE00        R     MOV     R6,ptr
007B AF00        R     MOV     R7,ptr+01H
007D AA06              MOV     R2,AR6
007F EA                MOV     A,R2
0080 F500        R     MOV     ADDH,A
                                           ; SOURCE LINE # 43
0082 8F00        R     MOV     ADDL,R7
I've no idea why Keil use R2 register neither...

semaj's union approach


typedef union
   {
   unsigned short u16;
   unsigned char u8[2];
   } U16_U8;

U16_U8 ptr;

// Do something to set the variable ptr ptr.u16 = ?;

ADDH = ptr.u8[0]; ADDL = ptr.u8[1];

SDCC gives me


;   t.c:26: ADDH = uptr.u8[0];
    mov _main_ADDH_1_1,_main_uptr_1_1
;   t.c:27: ADDL = uptr.u8[1];
    mov _main_ADDL_1_1,(_main_uptr_1_1 + 0x0001)
Keil C51 gives me:

                                           ; SOURCE LINE # 26
0028 850000      R     MOV     ADDH,uptr
                                           ; SOURCE LINE # 27
002B 850000      R     MOV     ADDL,uptr+01H
which is very smiler to the pointers trick. However, this approach require two more bytes memory the store the union.

Does anyone have any other bright ideas? ;)

And anyone can tell me which way is more efficient?

In case anyone interested, here is the test case:


typedef union
{
    unsigned short u16;
    unsigned char u8[2];
} U16_U8;

// call a function on the ADDs to avoid optimizition void swap(unsigned char *a, unsigned char *b) { unsigned char tm; tm = *a; *a = *b; *b = tm; }

main (void) { char c[] = "hello world."; unsigned char xdata *ptr = (unsigned char xdata *)c; unsigned char ADDH, ADDL; unsigned char i = 0;

U16_U8 uptr;
uptr.u16 = (unsigned short)ptr;

for ( ; i < 4 ; i++, uptr.u16++){
    ADDH = uptr.u8[0];
    ADDL = uptr.u8[1];
    swap(&ADDH, &ADDL);
}

for ( ; i < 4 ; i++, ptr++){
    ADDH = (unsigned int) ptr >> 8;
    ADDL = (unsigned int) ptr & 0x00FF;
    swap(&ADDH, &ADDL);
}
for ( ; i < 4 ; i++, ptr++){
    ADDH = ((unsigned char *)&ptr)[0];
    ADDL = ((unsigned char *)&ptr)[1];
    swap(&ADDH, &ADDL);
}
for ( ; i < 4 ; i++, ptr++){
    ADDH = (unsigned int)ptr / 256;
    ADDL = (unsigned int)ptr % 256;
    swap(&ADDH, &ADDL);
}

}

Grissiom
  • 11,355
  • 3
  • 18
  • 23
  • 1
    "anyone can tell me which way is more efficient". Since you've got the 8051 compiler right there, and I don't, how about you post the disassembly of each option, to give others a sporting chance of commenting on the efficiency ;-) – Steve Jessop Mar 29 '10 at 14:43
  • Thanks for the tip. I hope there is a way to "see asm code from C code". But as you asked, I will upload the asm code tomorrow. – Grissiom Mar 29 '10 at 14:59
  • 1
    For efficiency, you need to know the cycle count of each instruction involved (1 or 2 in each case). http://www.8052.com/51mov has the list, but it's a PITA working it out for your SDCC assembly, since it doesn't list the opcodes, so you have to figure out which variant of MOV each instruction is. Also beware when comparing code variants that use different registers, that the chunk of code you're actually looking at may have knock-on effects on other bits of code. As you say, some versions are using more registers than you'd expect, presumably for an actual reason... – Steve Jessop Mar 30 '10 at 22:17
  • So, in summary, might be easier just to test it. Maybe on a simulator. – Steve Jessop Mar 30 '10 at 22:21

4 Answers4

6

The most efficient way is completely dependent on the compiler. You definitely have to figure out how to get an assembly listing from your compiler for an 8051 project.

One method you might try that is similar to those already mentioned is a union:

typedef union
   {
   unsigned short u16;
   unsigned char u8[2];
   } U16_U8;

U16_U8 ptr;

// Do something to set the variable ptr
ptr.u16 = ?;

ADDH = ptr.u8[0];
ADDL = ptr.u8[1];
semaj
  • 1,555
  • 1
  • 12
  • 25
  • I use this method to when dealing with endian issues on AVR with GCC where the other methods don't always generate decent code -- especially for constants. – nategoose Mar 29 '10 at 19:21
3

Another not so bright way to split the address:

 ADDH = ptr / 256;
 ADDL = ptr % 256;
Anders Westrup
  • 825
  • 5
  • 6
  • Man, do you know cost of div instructions on processors like 8051? – Andrey Mar 29 '10 at 15:10
  • 2
    @Andrey: I don't know what compilers for 8051 are like, but at least for an unsigned type I'd be astonished if a C compiler (on any architecture) which claimed to optimize at all, emitted different code for `ptr / 256` as against `ptr >> 8`. I can't say what insns it uses, but it's not that hard for the compiler to spot the equivalence and pick the best. Negative values aren't necessarily so simple. – Steve Jessop Mar 29 '10 at 15:32
2

most efficient is first one, since it is done in single instruction.

NO! I lied to you sorry. I forgot that 8051 instruction set has only 1-bit shift instructions. Second should be faster, but compiler may generate stupid code, so beware and check assembly code.

Andrey
  • 59,039
  • 12
  • 119
  • 163
  • But is there any other way to divide the address? – Grissiom Mar 29 '10 at 14:28
  • How can the second one generate several instructions? My guess is that they are equally efficient – Anders Westrup Mar 29 '10 at 14:29
  • 1
    Depends on compiler. The easiest way is to compile and check assembly – Andrey Mar 29 '10 at 14:30
  • 1
    @Anders: the main way I can think that a simple compiler might mess up 2, is that it might write `ptr` out of a register on to the stack in order to take the address of it, then read a char back from the stack. So a write and two reads for the two lines of code (hopefully to fast memory, like cache, I don't know the 8051 at all). Conversely, the first version of the code may or may not just perform some integer ops. I note that most of the 8051's registers are 8-bit, so either option might be a no-op: registers could just be reassigned so the register that was the low part of ptr is now ADDH. – Steve Jessop Mar 29 '10 at 14:49
  • first is slow. see my revised answer. there is no cache at 8051, but there is fast on chip memory, not transparent to developer – Andrey Mar 29 '10 at 15:10
  • @Andrey: I have now given myself the bluffer's introduction to 8051, and I can confidently guess that my mention of "cache" should be replaced with "internal RAM". Hopefully the compiler uses that for stack, the point being that although it might be worse than some other clever way of doing it, a spill to stack isn't as bad as a spill to external RAM would be, in terms of insn or cycle count. I'm still rooting for register reassignment, though, when we see the asm code. All depends how the variables are used afterwards, of course: if `ptr` isn't "zapped" it may not be possible. – Steve Jessop Mar 29 '10 at 15:24
  • @Steve: (regarding your first comment) why does it need to "write ptr out of a register on to the stack in order to take the address of it"? – Lazer Mar 29 '10 at 15:51
  • It doesn't need to (that's why it's a "mess up"), but as I said a primitive compiler *might* reasonably do so. All it takes is for the compiler to be smart enough to keep variables in registers, but not smart enough to do so once the address of the variable has been taken (and turn access through the pointer into register manipulations). I now see that on 8051, registers are directly addressable RAM if you know what bank you're on, so there may be no need at all. I'm curious to see what the questioner's compiler does, since I have no instinct for writing 8051 code myself. – Steve Jessop Mar 29 '10 at 16:04
  • ... so I'm not saying the second one will be bad, but Anders specifically asked "how *could* the second one generate several instructions", so I'm trying to oblige with a hopefully-plausible scenario. Perplexing register spills do happen from time to time, or at least perplexing to me :-) – Steve Jessop Mar 29 '10 at 16:06
  • I've uploaded the assembly code. Have a look at them if you are interested. ;) – Grissiom Mar 30 '10 at 04:50
2

I just create two defines(as follows).

It seems more straight forward, and less error prone.

#define HI(x)  ((x) >> 8)
#define LO(x)  ((x) & 0xFF)
Deqing
  • 14,098
  • 15
  • 84
  • 131
Jim T
  • 31
  • 1
  • Thanks! It is more clear then typing all the expressions all the time. But does the bitwise operation more effective than other ways? – Grissiom Apr 23 '10 at 14:04