1

I have some data type which, if I were to use plain old C, would be implemented as

typedef struct {
    ...many other members here...
    unsigned short _size;
    char           _buf[0];
} my_data; 

What I'd like to do, is to basically make that a class and add the usual operators like less, equality, copy constructor, operator assignment, and so on. As you can imagine I would then be using such class in associative containers like std::map as its key.

I need the buffer to be ideally at the same level of the object itself, otherwise when I have to compare two of them (buffers) I would have the CPU to take the pointer and load it in memory; I don't want to use std::vector because memory allocated wouldn't be contiguous with the rest of the data members.

Main issue for me is the fact that in C I would have a function which, given the size of the buffer would allocate proper memory size for it. In C++ such thing can't be done.

Am I right? Cheers

M.M
  • 138,810
  • 21
  • 208
  • 365
Emanuele
  • 1,408
  • 1
  • 15
  • 39
  • 4
    Zero array length array members are not valid in C++, so the answer to the question in the title is "no". – R. Martinho Fernandes Aug 10 '12 at 07:29
  • Theoretically - no. Practically - yes, there's absolutely no problem with this – valdo Aug 10 '12 at 07:30
  • What is the purpose of `_buf`? Maybe you can use some of the standard containers instead? – Some programmer dude Aug 10 '12 at 07:32
  • Ok. I guess even if I specified the array variable equal to one, no way I would be able to decide how much memory to allocate every time a new instance of my_data gets created, right? – Emanuele Aug 10 '12 at 07:32
  • 1
    Oh by the way, identifiers with single or double leading underscores are reserved by the C and C++ standard. – Some programmer dude Aug 10 '12 at 07:34
  • @JoachimPileborg Does that apply for identifiers not in the global scope? – Paul Manta Aug 10 '12 at 07:36
  • 2
    @Joachim Pileborg; identifiers with single leading underscore are only reserved at global scope (unless the character after the underscore is an upper-case letter). So `_buf` is fine. – JoeG Aug 10 '12 at 07:37
  • @valdo Are you sure I can write a _complete_ _C++_ class with all the required operators? How could I write proper variable allocation during the constructor? – Emanuele Aug 10 '12 at 07:37
  • @Emanuele A one-element array has one element. Since you're bound to use dynamic allocation anyway, what's wrong with `std::vector`? – R. Martinho Fernandes Aug 10 '12 at 07:38
  • @JoeGauterin Identifiers with a single leading underscore followed by a capital letter are reserved for all uses. Identifiers with a single leading underscore followed by a lower-case letter are only reserved in the global scope. http://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier – Martin B Aug 10 '12 at 07:39
  • To all, I need the buffer to be on the _same_ level of the object itself, otherwise when I have to compare two of them I would have the CPU to take the pointer and load it in memory; I don't **want** to use _std::vector_ because memory allocated wouldn't be contiguous. – Emanuele Aug 10 '12 at 07:41
  • [This](http://stackoverflow.com/q/5520591/500104) could be helpful for you. – Xeo Aug 10 '12 at 07:50
  • If your instances are of known size before-compile-time, you might use an integer template parameter as size of the array. If not, you will need a custom container... interesting though. – dsign Aug 10 '12 at 08:14
  • @dsign Unfortunately is **not** the case, otherwise solution(s) would have been many and much more trivial. Cheers :) – Emanuele Aug 10 '12 at 08:19
  • Objects in std::vector are stored contiguous (was not guaranteed in pre standard C++, but now it is). But I still don't understand why you'd want a zero length array. You cannot store anything in it, so why would you need it? Store a pointer instead and allocate memory in the constructor, or make your class a template with the array dimension as a parameter (`template struct ...`). – Axel Aug 10 '12 at 08:24
  • What is your concern with the CPU anyway? Is it all about performance? If you are working on such a low-level code that performance is really _this_ critical, you might want to consider doing the CPU-operations by yourself using assembler. Besides, as already mentioned by Axel, vectors are contigous as garantueed by the C++ standard. – Excelcius Aug 10 '12 at 09:42
  • As per question, a _std::vector_ **wouldn't** be contiguous to the other structure data members. @Axel I guess you've never seen the Zero Length Arrays in _C_, right? I'm using a Zero Length Array **because** data size is _dynamic_. – Emanuele Aug 10 '12 at 14:23
  • Yes, I guess most of us have not seen this - simply because it does not work. I still don't understand your problem. Why is it so important that no memory outside of your structure is used? If data size is dynamic, that means the size of the structure cannot be fixed unless you define a maximum size. But if you do so, why not just do `char _buf[MAX_SIZE]`? It seems like you are trying to make your solution work without letting us know what you are actually trying to achieve, and so we all can only guess. – Axel Aug 11 '12 at 09:40
  • PS: If you have a working solution in Plain old C, why don't you just use it? You could still write a C++ wrapper for it... – Axel Aug 11 '12 at 09:49
  • Indeed, I think I might be implementing a _C++_ class on top of the Linux kernel's _rbtree.c_, which would use as elements structures containing _Zero Length Array_. I'll give it a shot! Cheers – Emanuele Aug 12 '12 at 08:03

3 Answers3

1

This is quite impossible. Your object is effectively of variable size but the std::map will always treat it as a fixed size, and there is no way to implement copying or moving. You would need an old C-style container to use such a hack.

Edit: Custom allocator. Interesting solution, I hadn't thought of that. I don't know if you could make it work but it would be worth looking into.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Thanks for confirming this. I'll try to tackle part of the cause of this issue using custom allocators, _and/or_ I might have to write _C_ containers to do so (as you suggested). – Emanuele Aug 10 '12 at 07:55
  • A custom allocator would be an interesting solution. I hadn't thought of that, *but* I'm not actually sure how defined it would be. – Puppy Aug 10 '12 at 07:57
  • It would be used to tackle _part of the cause_ of this issue, and/or I could **try** to make the buffer start just _right after_ the pointer for each instance, so to be on the same CPU _CACHELINE_. – Emanuele Aug 10 '12 at 08:04
  • The problem with a custom allocator that automatically overallocates would be that it's actually *only* used for the complete nodes of the map, not for the single keys/values. – Xeo Aug 10 '12 at 08:04
  • You could override the `construct` function. – Puppy Aug 10 '12 at 08:06
  • @DeadMG Thanks again for you answer, indeed, is _not_ possible. But, what I've done is to implement a _C_ _map_ with custom _Zero Lenghh Array_ _key/value_. And guess what? Compared to the _C++_ variable _std::vector_ (or even raw _new []_ allocated pointer), the old _C++_ _std::map_ is on overall **+35% slower**. Alas, I was expecting this. Btw the _C map_ is based on _Linux's_ kernel http://lxr.free-electrons.com/source/lib/rbtree.c code. – Emanuele Aug 13 '12 at 12:56
0

You can approach it this way in C++:

struct MyData {
    unsigned short size_;
    char * buf () { return reinterpret_cast<char *>(&size_ + 1); }
    //...
};

You are likely going to want to overload the new operator to use malloc, and delete to use free, so that you can grow your data structure on demand with realloc.

As DeadMG points out, this cannot be used with STL containers very easily. One approach would be to use pointers to MyData in the containers. Another would be a custom smart pointer wrapper class for MyData.

Edit: This is a hack, where MyData acts as a kind of smart pointer, but the intelligence is managed by vector.

struct MyData {
    struct State {
        unsigned short size_;
        //...
    };
    std::vector<State> data_;
    MyData () {};
    MyData (unsigned short size)
        : data_(1 + size/sizeof(State) + !!size%sizeof(State)) {
        data_[0].size_ = size;
    }
    unsigned short size () const { return data_[0].size_; }
    //...
    char * buf () { return reinterpret_cast<char *>(&data_[1]); }
};
jxh
  • 69,070
  • 8
  • 110
  • 193
  • Thanks. But how could I dynamically decide the _allocated_ size of _Data_ at run time? – Emanuele Aug 10 '12 at 07:48
  • You would have to create the zero size version first, and then grow it. – jxh Aug 10 '12 at 07:50
  • 1
    Allocation and initialization are two different beasts in C++. At allocation time `operator new` is given `sizeof(MyData)` in order to pass it to `malloc`, but at this point you cannot tell anything about the data. – Stefan Majewsky Aug 10 '12 at 07:50
  • @Emanuele: `void* p = ::operator new(sizeof(MyData) + your_buf_len); MyData* d = ::new (p) MyData();`, something like that anyways. – Xeo Aug 10 '12 at 07:51
  • The problem with this approach is that one then _can't_ use the nice features like _STL_ and/or _smart pointers_ and so on. – Emanuele Aug 10 '12 at 07:58
  • That is the problem with the C style construct in C++. You should use `vector` instead as originally suggested. – jxh Aug 10 '12 at 07:59
  • @Emanuele: I provided an example use `vector` that keeps everything together. – jxh Aug 10 '12 at 09:05
  • @user315052 is within what you called State that we want to have a variable length array (contiguous with other State data members). Cheers – Emanuele Aug 10 '12 at 14:45
  • @Emanuele: The variable length array is within `vector`. It starts with element `1`, as element `0` has the state variables about the data. – jxh Aug 10 '12 at 15:11
  • @user315052 the problem is that we wouldn't be putting the _std::vector_ as a key of a map but _MyData_, right? So it wouldn't be exactly the same but one more indirection. – Emanuele Aug 10 '12 at 15:32
  • @Emanuele: That isn't any different than what a C container would do (you would have pointers to `struct my_data` in the container). – jxh Aug 10 '12 at 15:40
  • @user315052 That is the purpose of a _Zero Length Array_ :-) The _C_ structure would be all in the same _CACHELINE_. – Emanuele Aug 10 '12 at 16:11
  • @Emanuele: As is the structure encapsulated in the C++ `vector` in `MyData`. – jxh Aug 10 '12 at 20:19
0

No, it's not possible. Zero length arrays aren't legal C++.

You can1 do something very similar with an array of length 1, but you would still have to manage creation of instances yourself, so no copy constructor and no storing the objects in std::map.

Perhaps something like this:

class my_data {
public:
  static my_data* create(int size) {
    void* memory = malloc(sizeof(_size) + size);
    return new(memory) my_data(size);
  }

  static void destroy(my_data* ptr) {
    ptr->~my_data();
    free(ptr);
  }

private:
  //disable construction and destruction except by static methods above
  explicit my_data(int size) : _size(size) {}
  ~my_data(){}

  //disable copying
  my_data(const my_data&);
  my_data& operator=(const my_data&);

  unsigned short _size;
  char           _buf[1];
};

Note that the default constructor, destructor, copy constructor and assignment operator are all private, which greatly restricts how the class can be used.


1 In practical terms - it's not standards compliant, but it will work almost everywhere.

JoeG
  • 12,994
  • 1
  • 38
  • 63