2

I have a chunk of memory populated by external code which I'm trying to reverse engineer. I don't know the complete structure of this memory, but I do know a few fields (e.g. the chunk starts off with an int32 named 'foo' and there's a double at offset 0xC called 'bar'). I want to define a structure and essentially reinterpret-cast a pointer to this memory chunk to that structure, and have it line up. I'm not sure if there's a more conventional name for this technique but I'll refer to it as creating an 'overlay type'.

Here's a sketch of what I'd like to be able to do:

START_OVERLAY_TYPE(my_type, 0xFF) // struct named my_type, size 0xFF
    FIELD(0x00, int32_t foo);     // field int32_t foo at 0x00
    FIELD(0x0C, double bar);      // field double bar at 0x0C
END_OVERLAY_TYPE

Not having to use macros would be a plus, but I don't see a good way around them.

With my current implementation, I expand this to (something like):

__pragma(pack(push, 1))
template<size_t p> struct padding_t { unsigned char pad[p]; };
template<> struct padding_t<0> {};
struct my_type
{
    union
    {
        struct : padding_t<0xFF> {}; // ensure total size is 0xFF
        struct : padding_t<0x00> { int32_t foo; }; // field at 0x00
        struct : padding_t<0x0C> { double bar; }; // field at 0x0C
    };
};
__pragma(pack(pop))

This compiles and works great, at least in the versions I tried of clang, gcc, and VC++ (with appropriate changes to the pragma). Unfortunately, warnings abound due to the non-standard use of anonymous structs.

Is there any way to achieve the same effect while staying within the standard? The requirements are that it be reasonably simple to declare (like the current macro is), and that to the consumer, the usage is indistinguishable from

struct my_type { int32_t foo; double bar; }

at least to the casual observer.

The current code will work for my purposes, I'm just curious if there is a better approach I am overlooking.

Jacob
  • 1,699
  • 10
  • 11
  • I'm not sure if this is helpful for your use case, but I've once asked how I can use a variadic template to calculate offsets for structures of freely combinable types (at compile time): [Creating an array initializer from a tuple or variadic template parameters](http://stackoverflow.com/questions/18251815/creating-an-array-initializer-from-a-tuple-or-variadic-template-parameters). Finally I was able to construct a working solution from the answer. – πάντα ῥεῖ Apr 21 '14 at 15:35
  • This [question](http://stackoverflow.com/questions/18242003/creating-an-api-metaprogramming-dsl-for-static-initialization-of-a-layout-desc) was also related (To gve you more background about the idea). – πάντα ῥεῖ Apr 21 '14 at 15:37
  • 1
    Maybe you should use [Protobuf](https://code.google.com/p/protobuf/), [XDR](http://en.wikipedia.org/wiki/External_Data_Representation) or even something like [ASN-1](http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_1x) to express your data structures. Improper alignment may become fatal to your program on non-x86 architechtures (e.g. MIPS) – user3159253 Apr 21 '14 at 15:41
  • @πάνταῥεῖ I had a feeling there would be a way to express this with variadic templates. Unfortunately the usage side seems a bit more cumbersome than I would like. It is an interesting read though. Thanks! – Jacob Apr 21 '14 at 15:43
  • @user3159253 The OP has some legacy format to use, thus your advice isn't really helpful IMHO. – πάντα ῥεῖ Apr 21 '14 at 15:44
  • @Jacob _'Unfortunately the usage side seems a bit more cumbersome than I would like.'_ Hmm, no not really. What I came up finally for usage on the client side didn't look much more complicated than your proposed macro solution (more `<>` than `()` of course ;) ). – πάντα ῥεῖ Apr 21 '14 at 15:47
  • Well, maybe. When I was faced with a similar problem in one of our projects eventually it turned out that it's better to split data receiving and data processing code, and hide all gory details of the data structures and binary protocols under the hood of a standartized library – user3159253 Apr 21 '14 at 15:50
  • @user3159253 Right, I realize that the certain aspects of memory layout are inevitably left to the platform and there's no way this would be totally portable. It's just curiosity to me, to learn more about the language constructs. I know for a fact that I'm only going to use this on Windows/VC++/x86 (maybe x64 later), so portability is not actually important. If I were doing this 'for real' and not just for a personal project I might consider one of those alternatives. – Jacob Apr 21 '14 at 15:51
  • Ah, Ok. Then you've chosen a "non-interesting" platform to improve your skills :) – user3159253 Apr 21 '14 at 15:54
  • @user3159253 To be fair, a decent ASN-1 compiler and some mapping logic could serve for serious improvements. – πάντα ῥεῖ Apr 21 '14 at 15:54
  • Does it have to be a field `double bar;`, or would an accessor function `double& bar();` work just as well? – Ben Voigt Apr 21 '14 at 15:56
  • @πάνταῥεῖ When I say 'client side' I mean the one actually interacting with instances of type. In my example, if you have a void* memory, you can just reinterpret_cast(memory) and use it essentially as if it were just a POD with two named fields. I may be misreading the variadic array solution but it seems more complicated. – Jacob Apr 21 '14 at 15:56
  • @Jacob It boils down to declare a enum type to name and access the known fields, and bind those to a certain type. The necessary offsets, total size, whatever else necessary for the in memory representation of a structure is evaluated at compile time. – πάντα ῥεῖ Apr 21 '14 at 16:01
  • @BenVoigt Either would be fine, but I think the field would be preferable, since 't.bar() = 1.234' looks really weird. Could be an interesting approach though. – Jacob Apr 21 '14 at 16:04
  • @Jacob: Well, `return *reinterpret_cast(pad + offset);` is portable. It's a technique I've used for interfacing memory-mapped I/O in embedded systems. If you are doing memory-mapped I/O, you should additionally qualify the return type with `volatile`. – Ben Voigt Apr 21 '14 at 16:13
  • @BenVoigt That definitely seems simpler and less magical than my solution. It's probably the better way to do it, even if the syntax isn't quite as perfect as I want. I'll keep it in mind. – Jacob Apr 21 '14 at 16:19
  • Honestly, I think simple padding members are just fine. However, if you want to do something like this I'd need to spend a minute draft the code. But you could have the padding inside the struct containing the value itself rather than inheritance. Then the templated struct would have a non-type template parameter for the padding and a type parameter for the value it contains. You include instances of the templates in an unnamed union as you have above. The trick is, to get around the naming syntax in the templated type you provide an implicit conversion operator to the internal type. – Apriori Apr 21 '14 at 16:55

1 Answers1

2

You could try something like this with implicit type conversions and assignment operators for the internal struct containing the value. This way instead of using unnamed structs the struct bears the name, but the internals become the unnamed part through operator overloading.

I tried this out with some client code (passing to functions, getting/setting values) and everything seemed fine. It's of course possible that I missed a scenario somewhere.

__pragma(pack(push, 1))
template<size_t p, typename t>
struct padding_t
{
    unsigned char pad[p];
    t             val;
    operator t  () const {return val;}
    operator t& ()       {return val;}
    padding_t<p, t>& operator= (const t& rhs) {val = rhs; return *this;}
};
template<typename t> struct padding_t<0, t>
{
    t             val;
    operator t  () const {return val;}
    operator t& ()       {return val;}
    padding_t<0, t>& operator= (const t& rhs) {val = rhs; return *this;}
};
template<size_t p>
struct sizing_t
{
    unsigned char pad[p];
};
struct my_type
{
    union
    {
        sizing_t<0xFF>           size; // ensure total size is 0xFF
        padding_t<0x00, int32_t> foo;  // field at 0x00
        padding_t<0x0C, double>  bar;  // field at 0x0C
    };
};
__pragma(pack(pop))
Apriori
  • 2,308
  • 15
  • 16
  • For bonus points: If you move the size member to the last in my_type and provide "copy constructors" (quotes because it would take a the internal type as a parameter, not the type itself), then you could probably even use C-style struct initialization with ={}. – Apriori Apr 21 '14 at 17:40
  • I'll mark this as the answer, since it's the only full solution posted. I thought about doing something involving implicit conversions but wasn't sure about the ability to make it completely transparent. This seems like it would be pretty simple to use, albeit not perfect (since intellisense and debugger displays would leak the details). But probably close enough. – Jacob Apr 25 '14 at 15:39
  • @Jacob: Thanks. I won't be offended if someone thinks of something more transparent and you choose to remove the chosen-answer-crown. There's also nothing wrong with using a unnamed union unnamed struct combination; or for that matter just naming the inner struct and having a little extra on the client side. I'll think about it a little more too, it's an interesting question. – Apriori Apr 25 '14 at 15:54
  • @Jacob: I've thought about this a little bit more. I believe a general case solution will require both variadic templates and variadic macros. You need macros because you won't be able to give the members names using just variadic templates. With the variadic templates you can select/remove members by smallest offset, carry type information, and figure out the size of padding array between elements. Then a variadic macro could be used to define struct elements one by one. Time permitting, I may tinker with this more; but I just wanted to let you know that I think it's possible in this manner. – Apriori Apr 28 '14 at 21:37