Compilers and assembly

Question

Why is it that compilers never start offsetting from index zero of the base pointer, see here:

! 1
! 1 # 1 "test.c"
! 1 struct test{
! 2  int a;
!BCC_EOS
! 3  int b;
!BCC_EOS
! 4 };
!BCC_EOS
! 5
! 6 int main()
! 7 {
export  _main
_main:
! 8   int a;
!BCC_EOS
! 9   char array[5];
!BCC_EOS
! 10   array[a] = 10;
push    bp
mov     bp,sp
push    di
push    si
add     sp,*-8
! Debug: ptradd int a = [S+$E-8] to [5] char array = S+$E-$D (used reg = )
mov     ax,-6[bp]
mov     bx,bp
add     bx,ax
! Debug: eq int = const $A to char = [bx-$B] (used reg = )
mov     al,*$A
mov     -$B[bx],al
!BCC_EOS
! 11   return array[a];
! Debug: ptradd int a = [S+$E-8] to [5] char array = S+$E-$D (used reg = )
mov     ax,-6[bp]
mov     bx,bp
add     bx,ax
! Debug: cast int = const 0 to char = [bx-$B] (used reg = )
mov     al,-$B[bx]
xor     ah,ah
add     sp,*8
pop     si
pop     di
pop     bp
ret
!BCC_EOS
! 12
! 13 }
! 14
! Register BX used in function main

I am writing my own compiler and don't want to be the unfortunate one to make a mistake if their is a reason to it.

Also how is the code generated above even safe to use as the BX register defaults to the data segment, which shouldn't be pointing at the stack anyway.

After `push ebp` / `mov ebp, esp`, what's at `[ebp+0]`? Your caller's saved EBP value, which you need to restore later. — Peter Cordes, Nov 22 '16 at 04:41
What's up with the pastebin links? If you wanted to give an example, use a code block. Not downvoting only because your question is clear without them (except that you mention BX, which is not "the base pointer"). — Peter Cordes, Nov 22 '16 at 04:43
With scope variables you minus the base pointer, not add. I am talking about subtracting the base pointer to access the scope variables, not the function arguments. — NibbleBits, Nov 22 '16 at 04:43
Peter they are two seperate questions, please view both pastes. — NibbleBits, Nov 22 '16 at 04:44
If your pastes can't fit in your question, it's not a good SO question. Especially not if it's two separate unrelated questions. — Peter Cordes, Nov 22 '16 at 04:44
I only want someone to answer my questions, if you can't then please don't spam. I know index zero is the same as ebp. I state index 0 so it is more clear to follow. Please answer my question or stop commenting. — NibbleBits, Nov 22 '16 at 04:45
I already answered your first question with my first comment (instead of posting an answer because I'm looking to see if it's a duplicate of an existing question):. The answer is: because the two bytes at `[bp]` are already in use. — Peter Cordes, Nov 22 '16 at 04:48
Peter I'm talking about accessing the scope variables [bp-4]. I understand why you would have to do [bp+2] when accessing function arguments due to the previous base pointer being on the stack. — NibbleBits, Nov 22 '16 at 04:56
You're asking why compilers don't store a local variable at `[bp - 0]`, right? That is exactly what I answered. — Peter Cordes, Nov 22 '16 at 04:58
Your question shows a fundamental lack of understanding of the machine architecture. In 16 bit code like this, the stack and data segments are frequently the same so that all data pointers can be 16 bits. If they are different, pointers are 32 bits, a 16 bit segment and 16 bit offset. — 1201ProgramAlarm, Nov 22 '16 at 05:01
Thank you, you pointed out another issue, I'll have to amend my compiler. I start accessing scope variables at [bp-0] my mistake. Do you know why it is not just [bp-2] and it is starting at [bp-6] — NibbleBits, Nov 22 '16 at 05:02
damn of course I did not notice this before, my compiler is coming along quite well believe it or not, I already have if statements, for loops and am working on arrays, pointers work too. I just sometimes have a little trouble with the memory. Can anyone answer my question about BX being unsafe to use as it points to the data segment not the stack segment? — NibbleBits, Nov 22 '16 at 05:05
The first answer on the linked duplicate explains exactly what's where, and points out that compilers will for example store a single-byte `char` at `[rbp-1]`, since that function didn't use any extra space below RBP by saving other regs. — Peter Cordes, Nov 22 '16 at 05:05
As for using BX, doesn't `bcc` target the "tiny" model, where ES=DS=SS? That vastly simplifies a C implementation, because pointers can be passed between functions, and are only 16 bits. If `char*` is 24 or 32 bits, including a segment, then code-generation for pointers needs to assume that every pointer has a different segment, or check that they don't and generate efficient code for the case where they're the same. That's a completely separate question, though. — Peter Cordes, Nov 22 '16 at 05:06
So what your saying is they all share the same segment, hense there is no need to worry about using BX. I had arrays working perfectly until I introduced segments and it all went to hell, spent the past few days refactoring arrays. — NibbleBits, Nov 22 '16 at 05:08
Just updated my previous comment. IDK how real compilers implemented it in practice, but it's probably pretty hairy. 32-bit x86 is a much easier compiler target, since everything just works (no restrictions on which regs you can use in addressing modes), and all major OSes use a flat memory model with all segments equal. I would strongly suggest you either keep it simple like BCC, or target 32-bit. Don't try to implement segmented pointers as your first attempt, especially if you're still learning stuff like `[bp-0]` is occupied. — Peter Cordes, Nov 22 '16 at 05:11
I was aware [bp-0] would have the old base pointer, I just for some weird reason assumed it would only be a problem for accessing function arguments, I start arguments at [bp+2]. It seems a bit silly to think that I didn't see it before. I have come a long way, the language is 70% implemented, I even had structures working at one point. Its a good learning experience for me. I thought 8086 would be easier than x86. Thanks for your advice guys, I will use it. — NibbleBits, Nov 22 '16 at 05:14
`[bp+2]` is the return address. See http://stackoverflow.com/a/21192889/224132, and divide everything by 2 because you're using 16-bit code instead of 32-bit, where the size of each stack operation is half. Anyway, glad it helped, have fun with asm. There's lots of good stuff in the [x86 tag wiki](http://stackoverflow.com/tags/x86/info), so make sure you take a look at that. — Peter Cordes, Nov 22 '16 at 06:14
*" I thought 8086 would be easier than x86.* ... as somebody who did both (I mean ASM coding in size of hundreds of kB of source), my brain simply explodes upon reading this... in laugh. Thank you for making me recall the joy and relief of exploring 32b x86 mode after couple of years of doing 16b, I didn't recall that one for a decade. I was so happy... :) — Ped7g, Nov 22 '16 at 10:57

Compilers and assembly

0 Answers0