C memory management for Cross-platform VM

Question

I asked a question about C-type sizes which I get a pretty good answer but I realized that I may not formulate the question very well to be useful for my purpose.

My background was from Computer Engineer before moves to Software Engineer so I like computer architectures and always thinking about making VM. I've just finished an interesting project making VM on Java which I am quite proud with. But there is some legal problems I can't open-source it now and I am currently have some free time. So I want to see if I can make another VM on C (with better speed) just for fun and educational.

The thing is I am not a C program the last time I wrote a non-trivia C problem was more than 10 years ago. I was Pascal, Delphi, and now Java and PHP programmer.

There are number of obstacles I can foresee and I am trying to tackle one and that is accessing existing library (in Java, reflection solves this problem).

I plan to solve this by having a buffer of data (similar to stack). The client of my VM can program to put data into these stack before giving me to pointer to native function.

int main(void) {
    // Prepare stack
    int   aStackSize = 1024*4;
    char *aStackData = malloc(aStackSize);

    // Initialise stack
    VMStack aStack;
    VMStack_Initialize(&aStack, (char *)aStackData, aStackSize);

    // Push in the parameters
    char *Params = VMStack_CurrentPointer(&aStack);
    VMStack_Push_int   (&aStack, 10  ); // Push an int
    VMStack_Push_double(&aStack, 15.3); // Push a double

    // Prepare space for the expected return
    char *Result = VMStack_CurrentPointer(&aStack);
    VMStack_Push_double(&aStack, 0.0); // Push an empty double for result

    // Execute
    void (*NativeFunction)(char*, char*) = &Plus;
    NativeFunction(Params, Result); // Call the function

    // Show the result
    double ResultValue = VMStack_Pull_double(&aStack); // Get the result
    printf("Result:  %5.2f\n", ResultValue);               // Print the result

    // Remove the previous parameters
    VMStack_Pull_double(&aStack); // Pull to clear space of the parameter
    VMStack_Pull_int   (&aStack); // Pull to clear space of the parameter

    // Just to be sure, print out the pointer and see if it is `0`
    printf("Pointer: %d\n", aStack.Pointer);

    free(aStackData);
    return EXIT_SUCCESS;
}

The push, pull and invocation of native function can be triggered by a byte code (that is how VM will later be made).

For the sake of completeness (so that you can try it on you machine), here is the code for Stack:

typedef struct {
    int  Pointer;
    int  Size;
    char *Data;
} VMStack;

inline void   VMStack_Initialize(VMStack *pStack, char *pData, int pSize) __attribute__((always_inline));
inline char   *VMStack_CurrentPointer(VMStack *pStack)                    __attribute__((always_inline));
inline void   VMStack_Push_int(VMStack *pStack, int pData)                __attribute__((always_inline));
inline void   VMStack_Push_double(VMStack *pStack, double pData)          __attribute__((always_inline));
inline int    VMStack_Pull_int(VMStack *pStack)                           __attribute__((always_inline));
inline double VMStack_Pull_double(VMStack *pStack)                        __attribute__((always_inline));

inline void VMStack_Initialize(VMStack *pStack, char *pData, int pSize) {
    pStack->Pointer = 0;
    pStack->Data    = pData;
    pStack->Size    = pSize;
}

inline char *VMStack_CurrentPointer(VMStack *pStack) {
    return (char *)(pStack->Pointer + pStack->Data);
}

inline void VMStack_Push_int(VMStack *pStack, int pData) {
    *(int *)(pStack->Data + pStack->Pointer) = pData;
    pStack->Pointer += sizeof pData; // Should check the overflow
}
inline void VMStack_Push_double(VMStack *pStack, double pData) {
    *(double *)(pStack->Data + pStack->Pointer) = pData;
    pStack->Pointer += sizeof pData; // Should check the overflow
}

inline int VMStack_Pull_int(VMStack *pStack) {
    pStack->Pointer -= sizeof(int);// Should check the underflow
    return *((int *)(pStack->Data + pStack->Pointer));
}
inline double VMStack_Pull_double(VMStack *pStack) {
    pStack->Pointer -= sizeof(double);// Should check the underflow
    return *((double *)(pStack->Data + pStack->Pointer));
}

On the native function side, I created the following for testing purpose:

// These two structures are there so that Plus will not need to access its parameter using
//    arithmetic-pointer operation (to reduce mistake and hopefully for better speed).
typedef struct {
    int    A;
    double B;
} Data;
typedef struct {
    double D;
} DDouble;

// Here is a helper function for displaying
void PrintData(Data *pData, DDouble *pResult) {
    printf("%5.2f + %5.2f = %5.2f\n", pData->A*1.0, pData->B, pResult->D);
}

// Some native function
void Plus(char* pParams, char* pResult) {
    Data    *D  = (Data    *)pParams; // Access data without arithmetic-pointer operation
    DDouble *DD = (DDouble *)pResult; // Same for return
    DD->D = D->A + D->B;
    PrintData(D, DD);
}

When executed, the above code returns:

10.00 + 15.30 = 25.30
Result:  25.30
Pointer: 0

This work well on my machine (Linux x86 32bits GCC-C99). It will be very nice if this works on other OS/Architecture too. But there are AT LEAST three memory-related issures we have to be aware of.

1). Data size - It seems like if I compile both VM and native functions using the same compiler on the same architecture, the size types should be the same.

2). Endianness - Same with Data size.

3). Memory alignment - Which is the problem as padding-bytes may be added in struct but it is hard to synchronize it when prepare parameter stack as (there is no way to know how padding is added except for hard coding).

My questions are:

1). If I know the size of the types, is there a way to modify push and pull function to exactly synchronize with struct padding? (modify to let compiler takes care of it like Datasize and Endians problems).

2). If I pack structure by one (using #pragma pack(1)); (2.1) Will the performance penalty be acceptable? and (2.2) Will the program stability be at risk?

3). How about padding by 2,4, or 8? Which should be good for general 32 or 64 bits system?

4). Can you guide me to a documentation for an exact padding algorithm let say for GCC on x86?

5). Is there is a better way?

NOTE: Cross-platform it is not my ultimate goal but I can't resist. Also, performance is not my target as soon as it is not so ugly. All these are for fun and learning.

Sorry for my English and the very long post.

Thanks everyone in advance.

+1 , I wish there were more questions like this on SO. – Tim Post Oct 24 '09 at 05:28 — Tim Post, Oct 24 '09 at 05:28

score 2 · Accepted Answer · edited May 23 '17 at 11:48

Tangential Comments

These first items are tangential to the questions you asked, but...

// Execute
void (*NativeFunction)(char*, char*) = &Plus;
NativeFunction(Params, Result); // Call the function

I think you should probably be using 'void *' instead of 'char *' here. I would also have a typedef for the function pointer type:

typedef void (*Operator)(void *params, void *result);

Then you can write:

Operator NativeFunction = Plus;

The actual function would be modified too - but only very slightly:

void Plus(void *pParams, void *pResult)

Also, you have a minor naming problem - this function is 'IntPlusDoubleGivesDouble()', rather than a general purpose 'add any two types' function.

Direct answers to the questions

1). If I know the size of the types, is there a way to modify push and pull function to exactly synchronize with struct padding? (modify to let compiler takes care of it like Datasize and Endians problems).

There isn't an easy way to do that. For example, consider:

struct Type1
{
     unsigned char byte;
     int           number;
};
struct Type2
{
     unsigned char byte;
     double        number;
};

On some architectures (32-bit or 64-bit SPARC, for example), the Type1 structure will have 'number' aligned at a 4-byte boundary, but the Type2 structure will have 'number' aligned on an 8-byte boundary (and might have a 'long double' on a 16-byte boundary). Your 'push individual elements' strategy would bump the stack pointer by 1 after pushing the 'byte' value - so you would want to move the stack pointer by 3 or 7 before pushing the 'number', if the stack pointer is not already appropriately aligned. Part of your VM description will be the required alignments for any given type; the corresponding push code will need to ensure the correct alignment before pushing.

2). If I pack structure by one (using #pragma pack(1)); (2.1) Will the performance penalty be acceptable? and (2.2) Will the program stability be at risk?

On x86 and x86_64 machines, if you pack the data, you will incur a performance penalty for the misaligned data access. On machines such as SPARC ~~or PowerPC~~(per mecki), you will get a bus error or something similar instead - you must access the data at its proper alignment. You might save some memory space - at a cost in performance. You'd do better to ensure performance (which here includes 'performing correctly instead of crashing') at the marginal cost in space.

3). How about padding by 2,4, or 8? Which should be good for general 32 or 64 bits system?

On SPARC, you need to pad an N-byte basic type to an N-byte boundary. On x86, you will get best performance if you do the same.

4). Can you guide me to a documentation for an exact padding algorithm let's say for GCC on x86?

You would have to read the manual.

5). Is there is a better way?

Note that the 'Type1' trick with a single character followed by a type gives you the alignment requirement - possibly using the 'offsetof()' macro from <stddef.h>:

offsetof(struct Type1, number)

Well, I would not pack the data on the stack - I would work with the native alignment because that is set to give the best performance. The compiler writer does not idly add padding to a structure; they put it there because it works 'best' for the architecture. If you decide you know better, you can expect the usual consequences - slower programs that sometimes fail and are not as portable.

I am also not convinced that I would write the code in the operator functions to assume that the stack contained a structure. I would pull the values off the stack via the Params argument, knowing what the correct offsets and types were. If I pushed an integer and a double, then I'd pull an integer and a double (or, maybe, in reverse order - I'd pull a double and an int). Unless you are planning an unusual VM, few functions will have many arguments.

Doesn't look "Tangential" to me, it's exactly what poster asked for. ( Or leads into exactly what was asked for ) — Nicholas Jordan, Oct 24 '09 at 06:20
Thank you for answering :-D. So it seemds like it is not easily doable. I will look at offsetof Macro to see what I can do about. Also, I do not think I know I can do it better. The thing is the VM code need to pass on data to a function unknown to it (by it I mean to the VM but the function is know to the compile of the code running on VM executing the native function). This passing of the parameter need to be generalized and easily accessed to the native code (this to reduce mistake that comes from complicated procedure of accesssing it and that's why I use structure). — NawaMan, Oct 24 '09 at 07:09
One more thing to add is that I plan to use this as a way to access to existing native library like for example fopen, fputs and so on. Operators can be handled by the VM directly and that is best as the compiler will ensure correct operation over it. Thanks again for your answer. I will try to see what I can do with offsetof and perhaps other way to delegate this native access job to the compiler. Cheer!! :D — NawaMan, Oct 24 '09 at 07:12
You don't get a bus error on PPC; PPC can handle misalignment (PPC is not Motorola 68000), it just costs some performance. On x86 the performance penalty is rather small, BTW, much smaller than on most other platforms. And if you use the GCC packed attribute and access the struct only via struct fields, GCC generates code that makes sure no misaligned read ever takes place (e.g. reading a field in multiple steps instead of just one) — Mecki, Jan 25 '11 at 13:07

dirkgently · Answer 2 · 2009-10-23T22:36:26.823

Interesting post and shows that you've put in a lot of work. Almost the ideal SO post.

I do not have ready answers, so please bear with me. I will have to ask a few more questions :P

1). If I know the size of the types, is there a way to modify push and pull function to exactly synchronize with struct padding? (modify to let compiler takes care of it like Datasize and Endians problems).

Is this from a performance point of view only? Are you planning to introduce pointers along with native arithmetic types?

2). If I pack structure by one (using #pragma pack(1)); (2.1) Will the performance penalty be acceptable? and (2.2) Will the program stability be at risk?

This is an implementation-defined thing. Not something you can count on across platforms.

3). How about padding by 2,4, or 8? Which should be good for general 32 or 64 bits system?

The value that matches with the native word size ought to give you optimal performance.

4). Can you guide me to a documentation for an exact padding algorithm let say for GCC on x86?

I don't know of any of the top of my head. But I have seen code similar to this being used.

Note, that you can specify attributes of variables using GCC (which also has something called default_struct __attribute__((packed)) that turns off padding).

Thanks for your post. :D 1). Well, as I said, it is good to have a good performance but what I most concern is that I will corrupt the memory. :p As for pointer, I will add in there for sure (if things go well). 2). That is a bad news. 3). I think so too. 4). I will look at the links. Thanks again :D — NawaMan, Oct 24 '09 at 04:24

Nicholas Jordan · Answer 3 · 2009-10-24T03:19:48.877

There are some very good questions here, many of them will get tangled in some important design issues but for most of us - we can see what you are working towards ( dirkgently just posted as I write so you can see you are generating interest ) we can understand your English well enough that what you are working towards is some compiler issues and some language design issues - it becomes difficult to work the question but in that you are already working in JNI there is hope ...

For one thing, I would try to get away from pragmas; Many folks, very many will disagree with that. For canonical discussion of why see the justification for the D language position on the issue. For another, there is a 16-bit pointer buried in your code.

The issues are near endless, well studied, and likely to get us buried in opposition and intramural intransigence. if I may suggest reading Kenneth Louden's Home Page as well as The intel architecture manual. I have it, I have tried to read it. Data structure alignment, along with many of the other issues you put up for discussion are deeply buried in historical compiler science and are likely to get you awash in who knows what. ( slang or idiomatic for unforeseeable matters of consequence )

With that said, here goes:

C-type sizes What type sizes?
Computer Engineer before moves to Software Engineer Ever studied microcontrollers? Hava a look at some of Don Lancaster's work.
Pascal, Delphi, and now Java and PHP programmer. Those are comparatively removed from the base fundamental architecture of processors, though plenty of persons will show or try to show how they can be used to write powerful and fundamental routines. I suggest looking at David Eck's recursive descent parser to see exactly how to begin study of the matter. As well, Kenneth Louden has an implementation of "Tiny" which is an actual compiler. I found something not too long ago that I think was called asm dot org ... very advanced, very powerful work was available for study there but it is a long haul to start writing in assembler intending to get into compiler science. Additionally, most architectures have differences that are not consistent from one processor to another.
accessing existing library

There are many libs around, Java has some good ones. I don't know about the others. One approach is to try to write a lib. Java has a good base and leaves room for people like to to try to come up with something better. Start with improving Knuth-Morris-Pratt or something: There is just no shortage of places to start. Try Computer Programming Algorithms Directory and for sure, look at Dictionary of Algorithms and Data Structures at NIST

always_inline

Not necessarily, see Dov Bulka - the worker holds a Doctorate in CS and as well is a proficient author in areas where time-efficiency / reliability-robustness and so on are not subject to some of the "business model" paradigm wherefrom we get some of the "Oh! that doesn't matter" on issues that actually do matter.

As a closing note, instrumentation and control comprise over 60% of the actual market for accomplished programming skills as you describe. For some reason, we hear mostly about the business model. Let me share with you and inside tidbit I have from a reliable source. From 10% to 60% or more actual safety and property risk comes from vehicular issues than comes from burglar, theft and that sort of thing. You will never hear appeals for "90 days bustin minerals at the county mineral extraction faciltiy!" for traffic tickets, in fact most people do not even realize traffic citations are ( N.A. - U.S.A. ) class 4 misdemeanor and are actually classifiable as such.

Sounds to me like you have taken a good step towards some good work, ...

Thank for posting. :D About the 16-but pointer, if you are talking about the Pointer field in Stack, it is an offset pointer. I hope that should work for stack so. About the inline, It seems that it is the only way for me to get the inline function working when C99 is on. And Hey thanks for the links too. — NawaMan, Oct 24 '09 at 04:23
It's gonna get too complicated, I just got hammered for something that is about like getting cats to follow a squeaking can-opener - too many people will not have done the depth of work, see Dov Bulka book and read up on ipp ... in-lining and a bunch of other issues really get into the light of day only by studying compiler science. Read Dov Bulka's book and just about anything in the Addison-Wesley Professional imprint that has the word "Effective" on the cover. We're not gonna squeak enough information through a comment scripting tool. — Nicholas Jordan, Oct 24 '09 at 06:17
NawaMan - I have been thinking about this and what I want you to do is get a copy of "Modern Cryptography" by Wenbo Mao (hp imprint) ISBN:0-13-066943-1 and read what the man has to say in the preface about things as they are. Most libraries will have some way of finding the book for you - this title is an in-house publication by a major hardware vendor. Also, get a copy of "The D Programming Language" (Andrei Alexandrescu) Addison-Wesley Professional ... I can see where you are going with this, you need to do some step-back and re-examine the overview. — Nicholas Jordan, Oct 26 '09 at 22:17

C memory management for Cross-platform VM

3 Answers3

Tangential Comments

Direct answers to the questions