2

I ask this question out of curiosity rather than difficulty, as I always learn from you, even on unrelated topics.

So, consider the following method, written in C++ and linked with g++. This method works fine, as everything is initialized to the correct size.

extern "C" 
  {
    void retrieveObject( int id, char * buffer )
      {
        Object::Object obj;

        extractObject( id, obj );
        memcpy( buffer, &obj, sizeof(obj) );
      }
  }

// Prototype of extractObject
const bool extractObject( const int& id, Object::Object& obj ) const;

Now, I would like to avoid declaration of a local Object and use of memcpy.

I tried to replace retrieveObject with something like :

void retrieveObject( int id, char * buffer )
  {
    // Also tried dynamic_cast and C-Style cast
    extractObject( id, *(reinterpret_cast<Object::Object *>(buffer)) );
  }

It compiles and links successfully, but crashes right away. Considering that my buffer is large enough to hold an Object, does C++ need to call the constructor to "shape" the memory ? Is there another way to replace local variable and memcpy ?

I hope I was clear enough for you to answer, thank you in advance.

Isaac Clarke
  • 717
  • 1
  • 9
  • 19
  • The question is, why are you doing this? Such things are almost never necessary in C++. (If you're serializing to a file or network communication, this is not the way to go about it.) – Thanatos Jan 06 '11 at 10:45
  • Object creation should involve constructors. Use the buffer in a constructor. – DumbCoder Jan 06 '11 at 10:46
  • I'm wondering why extractObject()'s return type is `const bool` and why it says `const int&` as one of it's parameter. What is the advantage? – Nawaz Jan 06 '11 at 10:50
  • @Thanatos & Nawaz : This excerpt comes from a very (very) large software application, this code's not mine and I can't modify any prototype. I just need to fill the buffer with the extracted data. – Isaac Clarke Jan 06 '11 at 12:00

5 Answers5

3

In your first effort...

void retrieveObject( int id, char * buffer )
{
     Object::Object obj;
     extractObject( id, obj );
     memcpy( buffer, &obj, sizeof(obj) );
} 

...you still had the compiler create the local variable obj, which guarantees correct alignment. In the second effort...

void retrieveObject( int id, char * buffer )
{
     extractObject( id, *(reinterpret_cast<Object::Object *>(buffer)) );
} 

...you're promising the compiler the buffer points to a byte that's aligned appropriately for an Object::Object. But will it be? Probably not, given your run-time crash. Generally, char*s can start on any given byte, where-as more complex objects are often aligned to the word size or with the largest alignment needed by their data members. Reading/writing ints, doubles, pointers etc. inside Object::Object may only work when the memory is properly aligned - it depends a bit on your CPU etc., but on UNIX/Linux, misalignment could generate e.g. a SIGBUS or SIGSEGV signal.

To explain this, let's consider a simple CPU/memory architecture. Say the memory allows, in any given operation, 4 bytes (a 32-bit architecture) to be read from addresses 0-3, 4-7, or 8-11 etc, but you can't read 4-byte chucks at addresses 1-4, 2-5, 3-6, 5-8.... Sounds strange, but that's actually quite a common limitation for memory, so just accept it and consider the consequences. If we want to read a 4-byte number in memory - if it's at one of those multiple-of-4 addresses we can get it in one memory read, otherwise we have to read twice: from one 4-byte area containing part of the data, then the other 4-byte area containing the rest, then throwing away the bits we don't want and reassembling the rest in the proper places to get the 32-bit value into the CPU register/memory. That's too slow, so languages typically take care to put values we want where the memory can access them in one operation. Even the CPUs are designed with this expectation, as they often have instructions that operate on values in memory directly, without explicitly loading them into registers (i.e. that's an implementation detail beneath even the level of assembly/machine code). Code that asks the CPU to operate on data that's not aligned like this typically results in the CPU generating an interrupt, which the OS might manifest as a signal.

That said, the other caveats about the safety of using this on non-POD data are also valid.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • Thank you very much, that's exactly the explanation I was looking for. I'm nastily coding, now I know why. – Isaac Clarke Jan 06 '11 at 12:10
  • 1
    @Isaac: you're welcome. After your query re my comment on doron's answer, I've added some more explanation above. Hope it helps. Cheers, Tony – Tony Delroy Jan 06 '11 at 12:26
2

What you are doing is effectively serializing Object and will work fine if and only if all the data in Object is stored contiguously. For simple object this will work fine but the minute there are object that contain pointers to other objects, this stops working.

In C++ it is extremely common for objects to contain other objects. the std::string is a case in point. The string class is a container that references a reference counter object stored elsewhere. So unless you are sure the object is a simple contiguous object, don't do this.

doron
  • 27,972
  • 12
  • 65
  • 103
  • That's all valid, but irrelevant to the actual observed behaviour (the crash), which is probably caused by buffer being misaligned for an Object::Object.... – Tony Delroy Jan 06 '11 at 12:00
  • My class is only made of simple public attributes (which would satisfy said requirements). I was able to **fill** the buffer with a simple `reinterpret_cast`, so I guess my class is stored contiguously, right ? – Isaac Clarke Jan 06 '11 at 12:03
  • **Tony**, what do you mean by "Misaligned" ? – Isaac Clarke Jan 06 '11 at 12:04
1

Well this may have many problems - first of all, if you use a local object, you cannot just construct it, and then write the memory of some other instance over it (that would work for POD types only, as they do not need the destructor to be called), otherwise you may very well get a nasty memory leak.

But that is not the main issue - the solution you had provided may, or may not work, based on the type of the object used. It will work for the simple POD types, it may even work for more complex classes (provided you will correctly handle constructors/destructors calls), but it will break at the moment some other part of the program expects the object to be at it's original location - let's say, you have a class, that has 2 member variables:

struct A {
   int i;
   int * pi;
}

where the 'pi' will always point to the 'i' member - if you "memcpy" that object to some other location, it will easily break.

Jan Holecek
  • 2,131
  • 1
  • 16
  • 26
  • Thanks for the heads up on memory leaks. I know it's nasty and that it will not work on 90% of the cases, but in my case, that works, because my structure is simple enough. – Isaac Clarke Jan 06 '11 at 12:07
1

You should take a look at boost.serialisation or boost::message_queues. C++ objects contain more then data (virtual tables) that are run time specific.

You should also put in consideration to add a version information about your objects while transferring them between modules.

Community
  • 1
  • 1
Raphael Bossek
  • 1,904
  • 14
  • 25
  • Thanks for you answer, unfortunately I cannot use external libraries. What do you mean by version information ? – Isaac Clarke Jan 06 '11 at 12:06
  • If you dump objects you do not really know their offset mapping if e.g. inheritance is changed or new attributes are added or old removed. In order to prevent wrong re-mapping of those objects you should check a kind of version tag of the received object. Maybe but the C++ objects in a C struct where you know the offset of the version: `strunct myobj { char version[4], Object data; };` – Raphael Bossek Jan 06 '11 at 12:49
1

Find out why and where it crashes, use a debugger. The code looks ok enough.

If you want to avoid the intermediate Object instance then simply avoid it. Make extractObject() return a pointer to Object and use this pointer to memcpy() its contents to the buffer.

However beware, as the other have said, if you then just reinterpret_cast<> the buffer back to Object things might break if the Object is not simple enough.

wilx
  • 17,697
  • 6
  • 59
  • 114
  • Unfortunately I cannot use a debugger (unless command-line gdb...) nor change function prototypes. Thanks for the info, though ! – Isaac Clarke Jan 06 '11 at 12:12
  • Use GDB then if that is the only thing that you can use. You should really first understand what is going wrong and only then fix it. – wilx Jan 06 '11 at 12:52