How to offload memory offset calculation from runtime in C/C++?

Question

I am implementing a simple VM, and currently I am using runtime arithmetic to calculate individual program object addresses as offsets from base pointers.

I asked a couple of questions on the subject today, but I seem to be going slowly nowhere.

I learned a couple of things thou, from question one - Object and struct member access and address offset calculation - I learned that modern processors have virtual addressing capabilities, allowing to calculate memory offsets without any additional cycles devoted to arithmetic.

And from question two - Are address offsets resolved during compile time in C/C++? - I learned that there is no guarantee for this happening when doing the offsets manually.

By now it should be clear that what I want to achieve is to take advantage of the virtual memory addressing features of the hardware and offload those from the runtime.

I am using GCC, as for platform - I am developing on x86 in windows, but since it is a VM I'd like to have it efficiently running on all platforms supported by GCC.

So ANY information on the subject is welcome and will be very appreciated.

Thanks in advance!

EDIT: Some overview on my program code generation - during the design stage the program is build as a tree hierarchy, which is then recursively serialized into one continuous memory block, along with indexing objects and calculating their offset from the beginning of the program memory block.

EDIT 2: Here is some pseudo code of the VM:

switch *instruction
   case 1: call_fn1(*(instruction+1)); instruction += (1+sizeof(parameter1)); break;
   case 2: call_fn2(*(instruction+1), *(instruction+1+sizeof(parameter1));
           instruction += (1+sizeof(parameter1)+sizeof(parameter2); break;
   case 3: instruction += *(instruction+1); break;

Case 1 is a function that takes one parameter, which is found immediately after the instruction, so it is passed as an offset of 1 byte from the instruction. The instruction pointer is incremented by 1 + the size of the first parameter to find the next instruction.

Case 2 is a function that takes two parameters, same as before, first parameter passed as 1 byte offset, second parameter passed as offset of 1 byte plus the size of the first parameter. The instruction pointer is then incremented by the size of the instruction plus sizes of both parameters.

Case 3 is a goto statement, the instruction pointer is incremented by an offset which immediately follows the goto instruction.

EDIT 3: To my understanding, the OS will provide each process with its own dedicated virtual memory addressing space. If so, does this mean the first address is always ... well zero, so the offset from the first byte of the memory block is actually the very address of this element? If memory address is dedicated to every process, and I know the offset of my program memory block AND the offset of every program object from the first byte of the memory block, then are the object addresses resolved during compile time?

Problem is those offsets are not available during the compilation of the C code, they become known during the "compilation" phase and translation to bytecode. Does this mean there is no way to do object memory address calculation for "free"?

How is this done in Java for example, where only the virtual machine is compiled to machine code, does this mean the calculation of object addresses takes a performance penalty because of runtime arithmetics?

How is this not a real question? Or maybe "how is this not a real question?" is not a real question neither? — dtech, Jul 15 '12 at 18:27
For one thing, you're mixing a bunch of different questions. The CPU is capable of computing address offsets very efficiently. But that has nothing to do with virtual addressing, which deals with a completely different issue (mainly providing each process with its own address space, and combating memory fragmentation). And your question about whether address offsets are resolved at compile-time is an entirely different thing *again*. Again, the processor can compute addresses very efficiently, but this happens at runtime, not at compile-time. That's probably why someone is voting to close — jalf, Jul 15 '12 at 18:41
It's just not really clear what you're trying to do, what you're expecting from the CPU and the compiler, and what you know and what you assume. However, what I *think* you're looking for code generation-wise is the `lea` instruction, which can compute an address such as `x[y].z` (that is, offset by some multiple of one operand, plus a constant offset) in a single cycle (IIRC). If you're generating the code yourself, use that for computing address offsets. If you're relying on some existing compiler, you just have to trust that it'll do its job properly. :) — jalf, Jul 15 '12 at 18:41
This isn't very clear, yet. Virtual address mapping is something done by hardware (typically), and has nothing to do with calculating the offset into structs, etc. Can you add a concrete example of the kind of thing that you're hoping to optimise? — Oliver Charlesworth, Jul 15 '12 at 18:42
But to answer the question from the title: "you can't, unless you have a time machine". How do you add a constant to an unknown value? You know the offset at compile-time, but you don't know the pointer value that it should be added to. So how can you add the two together? You have to wait until you actually know the two values you wish to add together. — jalf, Jul 15 '12 at 18:46
Having seen your updates, then what @jalf says is correct. You can't evaluate adding an offset to a runtime value (e.g. the instruction pointer) at compile-time! — Oliver Charlesworth, Jul 15 '12 at 18:57
@jalf - all offsets are known during "compile time" - the only unknown is the address of the first byte of the program block. In case it is a real address, it should be some arbitrary value, in case it is a virtual address, it should start from square one, so in a way the offset is the actual memory address, is this correct? Also, I am looking for a way to do it in hardware as the first linked question suggests, not necessary during compile time, just aiming to do it in the most efficient way. — dtech, Jul 15 '12 at 18:59
@ddriver: no. Your process only sees virtual addresses, not physical ones. And they don't "start from square one". Each block of allocated memory starts at some pseudo-arbitrary location, which the compiler certainly does not know. — jalf, Jul 15 '12 at 19:18
@jalf - case to take a look at the third edit of my question and shed some light on it? Thanks! — dtech, Jul 15 '12 at 19:20
In java, there's another compiler in the JVM that translates the bytecode to machine code the first time its run. That compiler knows these "runtime constants" at compile time so can do the arithmetic then. — Chris Dodd, Jul 15 '12 at 19:29
In C/C++ pointers are availible, and the virtual address value is (depending on the platform) a page number followed by the offset within that page. A page often resides on disk - it is up to the OS to respond to a page fault and load it into "real" RAM as required at runtime - it's likely that the mapping onto real addresses is different every time the page is loaded. By the way, Windows, Unix/Linux has address ranges that are not used (Guard regions on Windows) to protect against bad pointers. Low values from zero are an example, so address zero is always invalid. — cdarke, Jul 15 '12 at 19:31
@ChrisDodd - so what you are saying is the JIT compiler still translates to platform specific machine code and there is no way to get away with free offset calculation other than machine specific native code? — dtech, Jul 15 '12 at 19:35
Note that before the entire world became x86 and risc/arm, there were CPU ISAs that included weird multiple indirect multiple offset hardware addressing modes. Extinct because that "optimization" ended up in overall slower total systems. — hotpaw2, Jul 15 '12 at 19:48
To answer one question in your 3rd edit, the "first" address is not always zero due to [Address space layout randomization](http://en.wikipedia.org/wiki/Address_space_layout_randomization) (ASLR). — Blastfurnace, Jul 15 '12 at 19:51
In fact, I remember something like this technique being used on the 6502/Apple II and older non-VM machines. The interpreter figured out the variable base + offset, and then poked the result back into itself using different OPcodes as self-modifying code so that future iterations did not need to repeat the calculation. — hotpaw2, Jul 15 '12 at 20:05

Geoff Reedy · Accepted Answer · 2012-07-16T03:07:38.813

Here's an attempt to shed some light on how the linked questions and answers apply to this situation.

The answer to the first question mixes two different things, the first is the addressing modes in X86 instruction and the second is virtual-to-physical address mapping. The first is something that is done by compilers and the second is something that is (typically) set up by the operating system. In your case you should only be worrying about the first.

Instructions in X86 assembly have great flexibility in how they access a memory address. Instructions that read or write memory have the address calculated according to the following formula:

segment + base + index * size + offset

The segment portion of the address is almost always the default DS segment and can usually be ignored. The base portion is given by one of the general purpose registers or the stack pointer. The index part is given by one of the general purpose registers and the size is either 1, 2, 4, or 8. Finally the offset is a constant value embedded in the instruction. Each of these components is optional, but obviously at least one must be given.

This addressing capability is what is generally meant when talking about computing addresses without explicit arithmetic instructions. There is a special instruction that one of the commenters mentioned: LEA which does the address calculation but instead of reading or writing memory, stores the computed address in a register.

For the code you included in the question, it is quite plausible that the compiler would use these addressing modes to avoid explicit arithmetic instructions.

As an example, the current value of the instruction variable could be held in the ESI register. Additionally, each of sizeof(parameter1) and sizeof(parameter2) are compile time constants. In the standard X86 calling conventions function arguments are pushed in reverse order (so the first argument is at the top of the stack) so the assembly codes might look something like

case1: 
  PUSH [ESI+1]
  CALL fn1
  ADD ESP,4 ; drop arguments from stack
  ADD ESI,5
  JMP end_switch
case2: 
  PUSH [ESI+5]
  PUSH [ESI+1]
  CALL fn2
  ADD ESP,8 ; drop arguments from stack
  ADD ESI,9
  JMP end_swtich
case3:
  MOV ESI,[ESI+1]
  JMP end_switch
end_switch:

this is assuming that the size of both parameters is 4 bytes. Of course the actual code is up to the compiler and it is reasonable to expect that the compiler will output fairly efficient code as long as you ask for some level optimization.

score 1 · Answer 2 · answered Jul 15 '12 at 19:04

You have a data item X in the VM, at relative address A, and an instruction that says (for instance) push X, is that right? And you want to be able to execute this instruction without having to add A to the base address of the VM's data area.

I have written a VM that solves this problem by mapping the VM's data area to a fixed Virtual Address. The compiler knows this Virtual Address, and so can adjust A at compile time. Would this solution work for you? Can you change the compiler yourself?

My VM runs on a smart card, and I have complete control over the OS, so it's a very different environment from yours. But Windows does have some facilities for allocating memory at a fixed address -- the VirtualAlloc function, for instance. You might like to try this out. If you do try it out, you might find that Windows is allocating regions that clash with your fixed-address data area, so you will probably have to load by hand any DLLs that you use, after you have allocated the VM's data area.

But there will probably be unforeseen problems to overcome, and it might not be worth the trouble.

score 0 · Answer 3 · answered Jul 15 '12 at 18:33

0

Playing with virtual address translation, page tables or TLBs is something that can only be done at the OS kernel level, and is unportable between platforms and processor families. Furthermore hardware address translation on most CPU ISAs is usually supported only at the level of certain page sizes.

answered Jul 15 '12 at 18:33

hotpaw2

70,107
14
90
153

It's hard to be sure, but I don't think this has anything to do with what the OP wants to know. Despite the mention of virtual addresses, it seems unrelated to virtual memory, TLBs and all that stuff. But... hard to be sure... :) – jalf Jul 15 '12 at 19:19

score 0 · Answer 4 · answered Jul 15 '12 at 20:07

To answer my own question, based on the many responses I got.

Turns out what I want to achieve is not really possible in my situation, getting memory address calculations for free is attainable only when specific requirements are met and requires compilation to machine specific instructions.

I am developing a visual element, lego style drag and drop programming environment for educational purposes, which relies on a simple VM to execute the program code. I was hoping to maximize performance, but it is just not possible in my scenario. It is not that big of a deal thou, because program elements can also generate their C code equivalent, which can then be compiled conventionally to maximize performance.

Thanks to everyone who responded and clarified a matter that wasn't really clear to me!

How to offload memory offset calculation from runtime in C/C++?

4 Answers4