4

I have a pretty nasty bug that has been bothering for a while. Here's the situation, I'm creating an in memory filesystem. I have pre-allocated data blocks for each file in which to do reads and writes from. To implement directories I have a simple std::vector object containing all the files in the directory. This vector object is at the top of the file in each directory. Therefore, in order to read and write from the directory I read the first 16 bytes into a character buffer and type cast it as a vector (16 bytes because sizeof(vector<T>) is 16 on my system). Specifically, the first 16 bytes are not elements of the vector, but the vector itself. However, the vector is somehow being clobbered after I exit out of a key function.

The following code does not throw an exception and can correctly save a vector to a character buffer to be later retrieved.

#include <vector>
char dblock[16];

typedef std::vector<int> Entries;
void foo() {
    char buf[sizeof(Entries)];
    Entries* test = new (buf)Entries();
    test->push_back(0);
    for (int i = 0; i < sizeof(std::vector<int>); ++i) {
        dblock[i] = buf[i];
    }
}

void bar() {
    char buf[sizeof(Entries)];
    for (int i = 0; i < sizeof(std::vector<int>); ++i) {
        buf[i] = dblock[i];
    }
    Entries* test = (Entries*)buf;
    test->back();
}

int main()
{
    foo();
    bar();
    return 0;
}

However, once the function foo is changed to include an argument like so, any time I try to use iterators an exception is thrown.

void foo(int this_arg_breaks_everything) {
    char buf[sizeof(Entries)];
    Entries* test = new (buf)Entries();
    test->push_back(0);
    for (int i = 0; i < sizeof(std::vector<int>); ++i) {
        dblock[i] = buf[i];
    }
}

Looking at the disassembly, I found the problem assmbler when the function is tearing down its frame stack:

add         esp,128h  <----- After stack is reduced, vector goes to an unusable state.
cmp         ebp,esp  
call        __RTC_CheckEsp (0D912BCh)  
mov         esp,ebp  
pop         ebp  
ret

I found this code by using the debugger to test if ((Entries*)dblock)->back() returns an exception or not. Specifically the exception is "Access violation reading location 0XCCCCCCCC". The location in question is std::vector's std::_Container_base12::_Myproxy->_Mycont->_Myproxy == 0XCCCCCCCC; you can find it at line 165 of xutility.

And yes, the bug only happens when the inplace new operator is used. Using a normal new operator then writing test to dblock doesn't throw an exception.

Thus, I came to the conclusion that the ultimate reason for this is somehow the compiler is doing a bad stack unwind clobbering some part of memory that it shouldn't be.

Edit: Changed wording for clarity sake. Edit2: Explained magic numbers.

Cubez
  • 878
  • 5
  • 11
  • 3
    Check your sizes. On a 64bit Mac OSX `sizeof(Entries)` is 24 bytes, not 16.Consequently, your "transfer" isn't pulling the vector image if your platform is similar and you program is ill-formed. Any particular reason `dblock` was chosen to be a magic-number size (16) and *not* based on the *actual* size of an `Entries` entity (whatever that is on your platform) ? – WhozCraig Oct 29 '14 at 06:42
  • You need to make sure that `buf` has the proper alignment for an `Entries`. (And you should stick to either the typedef name or the name of the underlying type, otherwise things get confusing and error-prone.) – molbdnilo Oct 29 '14 at 06:45
  • 3
    std::vector contains pointers to heap memory space (where the actual data is). If you just copy the object to dblock, you are doing a shallow copy, and those pointers in the heap are now lost. It sounds like you are trying to serialize a vector object. This is not the way to do it. – Mark Lakata Oct 29 '14 at 06:45
  • As other people have mentioned, sizeof(Entries) can be anything. Why did you pick 16 bytes for `dblock`? On my system, gcc sets the size to 12 bytes. – Mark Lakata Oct 29 '14 at 06:46
  • 2
    And I'll add to that if your intent is to somehow (someday) persist this to an out-of-proc image (a disk file, across a socket or pipe, etc), think again. That vector contains an internal heap pointer (which you never allow it to release via manual destructor fire; a different issue) making your "image" utterly useless anywhere but this process during this run (and not much use there either, as far as that goes). – WhozCraig Oct 29 '14 at 06:46
  • @WhozCraig Sorry for the magic number. 16 on my system is sizeof(Entries). I just happened to know a priori what the correct size was. Thanks for the tip, right now it'll only be in memory and I won't have it persist for now. – Cubez Oct 29 '14 at 07:02
  • @MarkLakata Even though I'm not doing a shallow copy, the class elements itself should still have references to the heap memory. So they may not necessarily be lost. Just like how the default copy constructor copies over things as an exact copy, like what I'm doing. – Cubez Oct 29 '14 at 07:03
  • @molbdnilo Sorry, but could you explain what you mean by alignment? Hey, thanks for all your comments! But I do have to add one thing: this setup works but only if `foo` does not have an argument (`foo()`). If foo is `foo(int something)` then it won't work. – Cubez Oct 29 '14 at 07:04

4 Answers4

2

In visual studio 2013 this generates errors and after taking a look at the internal data of the vector it was pretty easy to figure out why. Vector allocates an internal object that does a lot of it's heavy lifting, this internal object in turn stores a pointer back to the vector and thus knows the location that the vector is supposed to be. When the vector's memory is moved from one location to another, the memory it occupies changes and thus that internal object now instead points back to memory that later gets wiped by debug code.

Looking at your code, this looks like the exact same thing. std::_Container_base12 is a base class used by the vector which has a member called myProxy. myProxy is an internal object that does heavy lifting, and has a pointer back to the vector that contains it. When you move the vector, you can't update this pointer and so when you go to use the vector data that was moved it tries to use myProxy which is still trying to refer back to the original vector location which was wiped. Because that data area was wiped, it looks for a pointer in it and instead finds 'CCCCCCCC' which is what the debug code does to memory data that was wiped. It tries to access that memory location and everything explodes.

Darinth
  • 511
  • 3
  • 14
  • Hey @Darinth, what you said is spot on. I changed the code to do a copy construct on `dblock` instead of a shallow copy and the code worked. Thanks again for your help! :D – Cubez Oct 29 '14 at 17:10
  • Additionally, I was thinking about this a little last night and realized why the parameter was most likely making a difference, and it does indeed involve the stack. Because foo and bar have the same signature, dblock was the first variable in both functions, and the functions are called from the same stack depth dblock was probably occupying the same memory in both functions. Once you changed the signature, the order of the stack of one of the functions changed and dblock occupied different memory. – Darinth Oct 30 '14 at 22:03
  • Additionally, if performance is an issue and you're running a c++11 compliant compiler (I suspect you're running visual studio, though I don't know which version) you should be able to use static_cast to force the use of the move constructor instead of the copy constructor, which means that you can avoid making a copy of whatever objects your vector is containing. Don't forget to call the destructor on dblock when you're done, or it'll produce memory leaks! – Darinth Oct 30 '14 at 22:06
  • 1
    Thanks for the tip! Instead of trying to find a work-around, I think the easiest way is to just roll my own STL objects :P – Cubez Oct 31 '14 at 02:42
2

What you are doing (serializing an opaque C++ object with the equivalent of memcpy to a local buffer) is not going to work sustainably because vector objects are deep objects with lots of pointers to heap memory. Nevertheless to answer your question, why you are getting a crash.

The problem is alignment. When you try doing

char buf[sizeof(Entries)];
Entries* test = new (buf)Entries();

this assumes that buf has the proper alignment for the object Entries. I am not going to claim to know the internal structure of vector, but I bet it looks something like

class vector
{
  T* start;
  T* end;
  other stuff
}

i.e. it is a bunch of pointers to heap. Pointers require register alignment, which is 8 bytes on a 64-bit machine. Alignment by N bytes means you can divide the address evenly by N. However, you are allocating buf on the stack, which is not guaranteed to have any alignment, but probably accidentally has 8 byte alignment because it is the only thing on your local stack frame. However, if you declare an argument to foo, and the argument is say an int which is 4 bytes, then buf is no longer 8-byte aligned since you've just added 4 bytes. Then when you try to access the pointers that are unaligned, you get your crash.

As an experiment, try changing foo to

void foo(int unused1, int unused2) {

This might accidentally realign buf, and it might not crash. However, stop what you are doing and don't do it this way.

See http://en.wikipedia.org/wiki/Data_structure_alignment for more info and this for guidance on serializing: http://www.parashift.com/c++-faq/serialization.html . You might consider a Boost vector class that is serializable.

Mark Lakata
  • 19,989
  • 5
  • 106
  • 123
0

Why not just write the following?

std::vector<int> dblock;

void foo() {
    dblock.push_back(0);
}

void bar() {
    dblock.back();
}
Chris Culter
  • 4,470
  • 2
  • 15
  • 30
  • Actually, this is what I want the code to do. In my filesystem I want to be able to read the first 16bytes as a vector object, not as elements of a vector. You can think of the pre-allocated memory similar to heap memory. Sorry that I wasn't clear on that :/ – Cubez Oct 29 '14 at 06:26
  • Oh, so... do you want there to be a global vector named `dblock`, and `foo()` and `bar()` should read/write `dblock`? – Chris Culter Oct 29 '14 at 06:30
  • Yes, but not exactly. In the filesystem, for generality, the data contained for each file is a simple char array. So the char array can be cast as vector, which can then be used normally for pushes and pops and what have you. But you are correct in saying that 'foo' and 'bar' should read/write 'dblock'. – Cubez Oct 29 '14 at 06:34
  • @user3705363 in this case, use `memcpy`. – The Paramagnetic Croissant Oct 29 '14 at 06:39
  • If you want to mutate the global object without leaking its internal pointers, you can't just copy over it, whether you write `char`s in a loop or call `memcpy()`. – Chris Culter Oct 29 '14 at 06:42
  • @TheParamagneticCroissant Granted, however this doesn't fix the problem of why a global variable that is not on `foo`'s stack frame is being clobbered after a stack unwind. – Cubez Oct 29 '14 at 06:44
  • @user3705363 and of course neither does it solve the problem of invalidated pointers... – The Paramagnetic Croissant Oct 29 '14 at 06:46
  • @ChrisCulter You might be onto something there. Why can't I just copy over the bytes and mutate the new object? – Cubez Oct 29 '14 at 06:50
  • I guess the answer depends... Which function are we focusing on, and what do we want it to achieve? – Chris Culter Oct 29 '14 at 06:52
  • 1
    @user3705363 because a vector contains a pointer to its internal buffer. If you grow the buffer by appending to the vector, the buffer is reallocated (i. e. it's deleted and a new one is assigned to the pointer). So, next time you use the copy of the older struct, you will end up using an earlier, now invalid pointer. – The Paramagnetic Croissant Oct 29 '14 at 07:21
0

Simplest answer for this: UNDEFINED BEHAVIOR.

std::vector isn't trivially copyable, you can't memcpy it from one place to another.

Another problem with your code is that dblock could have different alignment than std::vector. This could cause crash on some processors.

Third problem is compiler sometime could return garbage when you copy buff to dblock. Its because you break strict aliasing rule.

Community
  • 1
  • 1
Yankes
  • 1,958
  • 19
  • 20