-1

This is probably a noob question, as I have only just started diving into disassembly of c++ code to check out what the compiler is ultimately doing for me. Basically I have some c++ code(this is a toy example):

const int SIZE = 10000000;
for(auto i = 0; i < SIZE; i++)
{
  giantVector[i] = giantVector[i] * giantVector[i];
}

This ultimately compiles (in an optimized release build, minus the mmx instructions) to:

00021088  mov         esi,dword ptr [giantVector] //move a pointer to giantVector into esi
0002108B  xor         eax,eax                     //clear out eax
000210C4  mov         ecx,dword ptr [esi+eax*4]   //move int from vector into ecx
000210C7  imul        ecx,ecx                     //multiply ecx by itself
/* Move the result back into vector. This instruction uses esi as the base pointer
   to the first element in the vector then adds eax(loop counter) * 4(sizeof(int))
   to determine where to stick it. */
000210CA  mov         dword ptr [esi+eax*4],ecx   //move result back into vector
000210CD  inc         eax                         //increment the loop counter in eax
000210CE  cmp         eax,989680h                 //compare with SIZE constant
000210D3  jl          main+0C4h (0210C4h)         //If less, jump back into loop, otherwise fall through

My comments here are just my understanding of things, which I'm stepping through to get a better handle of things.

My question is.. how does the instruction at 000210CA work? Isn't esi + eax * 4 a computation itself? Why doesn't that instruction itself require other instructions to compute? Or is that what is really happening? The instructions seem sequential to me in address space.

If it helps at all this is compiled by Visual Studio 2015 and this code is pulled from the Disassembly debug window.

aqez
  • 182
  • 1
  • 9
  • 1
    See [this answer](http://stackoverflow.com/questions/34058101/referencing-the-contents-of-a-memory-location-x86-addressing-modes/34058400#34058400) that explains x86 addressing modes. Yes, `[base + index*scale]` is a valid addressing mode that can be used in any instruction that allows an effective-address as one of its operands. See also the links in the [x86 tag wiki](http://stackoverflow.com/tags/x86/info). – Peter Cordes Apr 13 '16 at 01:46
  • Also note that `mov esi,dword ptr [giantVector]` is a load. `std::vector` stores a pointer to the dynamically-allocated memory. You will have a much easier time understanding asm output from C, or C-like C++ that avoids STL and just uses arrays. Usually STL stuff compiles away to nothing, like here, but it can result in really huge amounts of asm. – Peter Cordes Apr 13 '16 at 01:50
  • 1
    @aqez This is about your question in the deleted answer: The LEA instruction (Load Effective Address) is often used to perform general integer math using x86 addressing. For a simple example, `eax *= 5` can be done with `LEA eax, [eax + eax*2]`. LEA lets you calculate an "address" without actually accessing memory. – Christopher Oicles Apr 13 '16 at 02:01

1 Answers1

3

The Intel x86 architecture allows addresses of the form [scale*index+base], where scale is 1, 2, 4, or 8, and index and base are registers (eax/ebx/ecx/edx/esp/ebp/esi/edi). Such addresses are represented using a machine instruction byte called the SIB byte. You cannot, of course, embed arbitrary computation inside a single assembly instruction.

Brian Bi
  • 111,498
  • 10
  • 176
  • 312