and esp, 0xfffffff0

Question

I don't entirely understand the line with comment in it below. I read a few posts on SO and in the gcc manual and learned that it is for stack address alignment but fail to understand how it does so. The code is show below:

(gdb) disas main
Dump of assembler code for function main:
   0x08048414 <+0>: push   ebp
   0x08048415 <+1>: mov    ebp,esp
   0x08048417 <+3>: and    esp,0xfffffff0 ; why??
   0x0804841a <+6>: sub    esp,0x10
   0x0804841d <+9>: mov    DWORD PTR [esp],0x8048510
   0x08048424 <+16>:    call   0x8048320 <puts@plt>
   0x08048429 <+21>:    mov    DWORD PTR [esp],0x8048520
   0x08048430 <+28>:    call   0x8048330 <system@plt>
   0x08048435 <+33>:    leave
   0x08048436 <+34>:    ret
End of assembler dump.

The code was generated using gcc (version 4.6.3) on linux. Thanks.

It's alignment by brute force, the lowest 4 bits are reset so by definition it is now aligned to at least 16. — harold, Jul 05 '14 at 17:30
It makes the address a multiple of 16, i.e. optimised for 128 bits processors. — Mr Lister, Jul 05 '14 at 17:31
@harold ok. yep. the last 4 bits of ESP are AND'ed with 0000b (hence the reset). But how does that make it 2^4 bytes aligned? perhaps i should understand what alignment means first. — gumchew, Jul 05 '14 at 17:37
N-byte alignment means that the start address is at a multiple of N bytes. If N is a power of 2, it also means that all the byte addresses that store the entire value are exactly the same *except* for the lowest-order log2(N) bits. This allows simple masking techniques like in the code you posted, instead of integer modulo (remainder from division) operations. — Mike DeSimone, Jul 05 '14 at 17:39
possible duplicate of [Why does the Mac ABI require 16-byte stack alignment for x86-32?](http://stackoverflow.com/questions/612443/why-does-the-mac-abi-require-16-byte-stack-alignment-for-x86-32) — nneonneo, Jul 05 '14 at 17:51
@MikeDeSimone your explanation along with others does add up and make sense. thanks guys. — gumchew, Jul 05 '14 at 17:57
excellent. Now, I fully understand. Thanks guys for your patience. it did take a bit of time but got there! as explained by @MikeDeSimone, this is a neat masking technique! :) — gumchew, Jul 05 '14 at 18:21

score 16 · Answer 1 · answered Jul 05 '14 at 17:52

16

and esp, 0xfffffff0 does a bitwise AND between the stack pointer and a constant, and stores the result back in the stack pointer.

The constant is chosen so that its low four bits are zero. Therefore the AND operation will set these bits to zero in the result, and leave the other bits of esp intact. This has the effect of rounding the stack pointer down to the nearest multiple of 16.

answered Jul 05 '14 at 17:52

nneonneo

171,345
36
312
383

1

"This has the effect of rounding the stack pointer down to the nearest multiple of 16." <- tried to understand "the effect" - still progressing. thx – gumchew Jul 05 '14 at 18:05
excellent. Now, I fully understand. Thanks for your patience. it did take a bit of time but got there! :) – gumchew Jul 05 '14 at 18:19

score 7 · Answer 2 · edited May 23 '17 at 12:08

It looks like it's part of some code to set up shop at the start of main.

Function start: save the base frame pointer on the stack (needed by the leave instruction later):

   0x08048414 <+0>: push   ebp

Now we align the stack pointer to a 16-byte bound, because the compiler (for whatever reason) wants it. This could be that it always wants 16-byte aligned frames, or that the local variables need 16-byte alignment (maybe someone used a uint128_t or they're using a type that uses gcc vector extensions). Basically, since the result will always be less than or equal to the current stack pointer, and the stack grows downward, it's just discarding bytes until it gets to a 16-byte aligned point.

   0x08048415 <+1>: mov    ebp,esp
   0x08048417 <+3>: and    esp,0xfffffff0

Next we subtract 16 from the stack pointer, creating 16 bytes of local variable space:

   0x0804841a <+6>: sub    esp,0x10

puts((const char*)0x8048510);

   0x0804841d <+9>: mov    DWORD PTR [esp],0x8048510
   0x08048424 <+16>:    call   0x8048320 <puts@plt>

system((const char*)0x8048520);

   0x08048429 <+21>:    mov    DWORD PTR [esp],0x8048520
   0x08048430 <+28>:    call   0x8048330 <system@plt>

Exit the function (see another answer about what leave does):

   0x08048435 <+33>:    leave
   0x08048436 <+34>:    ret

Example of "discarding bytes": say esp = 0x123C at the start of main. The first lines of code:

   0x08048414 <+0>: push   ebp
   0x08048415 <+1>: mov    ebp,esp

result in this memory map:

0x123C: (start of stack frame of calling function)
0x1238: (old ebp value) <-- esp, ebp

Then:

   0x08048417 <+3>: and    esp,0xfffffff0

forces the last 4 bits of esp to 0, which does this:

0x123C: (start of stack frame of calling function)
0x1238: (old ebp value) <-- ebp
0x1234: (undefined)
0x1230: (undefined) <-- esp

There's no way for the programmer to rely on a certain amount of memory being between esp and ebp at this point; therefore this memory is discarded and not used.

Finally, the program allocates 16 bytes of stack (local) storage:

Next we subtract 16 from the stack pointer, creating 16 bytes of local variable space:

   0x0804841a <+6>: sub    esp,0x10

giving us this map:

0x123C: (start of stack frame of calling function)
0x1238: (old ebp value) <-- ebp
0x1234: (undefined)
0x1230: (undefined)
0x123C: (undefined local space)
0x1238: (undefined local space)
0x1234: (undefined local space)
0x1230: (undefined local space) <-- esp

At this point, the program can be sure there are 16 bytes of 16-byte aligned memory being pointed to by esp.

wow. that's comprehensive. could you explain a little bit more about what "it's just discarding bytes until it gets to a 16-byte aligned point." means? I understand the rest of them pretty well. thx — gumchew, Jul 05 '14 at 18:06
excellent. Now, I fully understand. Thanks for your patience. it did take a bit of time but got there! :) — gumchew, Jul 05 '14 at 18:19
_Now we align the stack pointer to a 16-byte bound, because the compiler (for whatever reason) wants it. This could be that it always wants 16-byte aligned frames, or that the local variables need 16-byte alignment (maybe someone used a uint128_t or they're using a type that uses gcc vector extensions)._ <- it might be because of `GCC` generates SSE/SSE2 supporting code. Also, I am nowhere in main() using uint128_t type. — gumchew, Jul 05 '14 at 18:54

score 5 · Answer 3 · answered Nov 28 '18 at 12:24

i know it was posted long time ago, it might help for others down the line.

1) In modern processors, we know that GCC aligns the stack defaulting to 16-byte alignment.

2) 16 byte ( 128 bit ) is because of SSE2 instructions which have MMX and XMM registers and XMM registers are of 128 bit.

3) so when a function call is made, it is automatically aligned to 16 byte, outside the function it remains to 8 byte.

4) the logic of using 0xfffffff0 is to keep the lower 4 bit to 0 , this is because of simple Boolean math which says that in binary , the multiples of 16 have low 4 bit to zero ( why four bits? 2^4 = 16 ).

and esp, 0xfffffff0

3 Answers3