2

I'm trying to create container that looks close to how my file spec works. It's like a vector but the type of the elements is defined by a hashtable.

If I knew the type at compile-time I could just write something like this:

struct foo {
              float a,b,c;
              int d;
              byte e,f;
           };

std::vector<foo> foovector;
foovector.push_back(foo f);

I don't have the struct at compile time. I only have a schema I get from the file header. All elements are the same size and have the same offsets for each item inside an element. The container has the hash table defined before any elements can be added.

typedef Toffset uint; //byte offset;
typedef Ttype   uint; //enum of types

std::unordered_map<std::string, std::pair<Toffset,Ttype>> itemKey; 
itemKey["a"] = 0;
itemKey["b"] = 4;
itemKey["c"] = 8;
itemKey["d"] = 12;
itemKey["e"] = 16;
itemKey["f"] = 17;

nstd::interleaved_vector superfoo(itemKey, 10); //hashtable, pre-allocation size

nstd::interleaved_vector::iterator myIterator;

myIteratorGlobal = superfoo.begin;
myIteratorA = superfoo["a"].begin;
myIteratorB = superfoo["b"].begin;

*myIteratorB = 2.0f;
*myIteratorGlobal["d"] = 512;

The idea is I can memcpy quickly the raw data in and out of files. Iterator offsets are easy. My questions are:

  1. Does anything do this already?

  2. Is this a bad idea? Should I just create a vector and new up each element? I expect to have millions of elements. The range of sizes of foo will be 20 to 200 bytes.

  3. This is a bad idea? I should instead create coupled vectors, one for each item?

  4. Or is this "interleaved_vector" a good solution to my problem?

Ben L
  • 1,449
  • 1
  • 16
  • 32
  • itemKey["a"] results in a pair, you're assigning an integer? The rest of your code doesn't really clarify anything. I have no idea what you intended. – Mooing Duck Aug 31 '11 at 00:10
  • Agreed, that part isn't clear and probably isn't correct. I'm creating a standin for a struct on the fly. There is a a table of types that an integer keys into for casting the data for the getter/setter. So each type has an id and each member needs a byte offset. This might have the same cost as a pointer dereference but the data can remain contiguous. – Ben L Aug 31 '11 at 17:00
  • I would also describe the data container like a 2D array, but the second dimension the array of members. e.g. void* FooContainer::GetData(int element, int member); – Ben L Aug 31 '11 at 17:05
  • I'm looking into custom allocators on the vector class. That might let me reduce the number of malloc's get called. Also, variadic templates and tuples might let me describe a dynamic struct. Any thoughts? – Ben L Aug 31 '11 at 19:28

3 Answers3

4

Is there an STL container that stores an array of elements in contiguous memory where the element size is specified at runtime?

No.

What you're asking for looks like a specific implementation of a memory pool. Maybe the Boost.Pool library or other implementations would be of use to you? Writing your own one shouldn't be hard if you're used to work with raw memory and C++-specific creation/destruction of objects.

To answer your questions:

Does anything do this already?

To me the idea looks like a memory pool. There are tons of kind of memory pools so the implementation you'll want totally depends on your specific needs.

Is this a bad idea? Should I just create a vector and new up each element? I expect to have millions of elements. The range of sizes of foo will be 20 to 200 bytes.

It's not a bad idea if you're wanting to limit memory fragmentation. Pools are often used to fix this and other memory-organisation related prolems.

For example, it's very frequent in video games to do this, mostly on console but also on PC if you need high performance or a lot of memory.

I just wouldn't recommand bothering if you're making a prototype or if you don't have tons of data to allocate. If you do, then maybe first implementing the simplest allocation (with vectors and new) hidden behind a factory would be nice and would allow you to replace the factory implementation by using a pool. That way you first check that everything work, then optimize.

Klaim
  • 67,274
  • 36
  • 133
  • 188
  • I intend to have several many gigabytes allocated so I can sort the elements and do other operations. My concern is to minimize the access overhead (iterator indirection, cache misses), and to have the interface be handled by the class without excessive lookups (recalculating offsets and iterators). – Ben L Aug 31 '11 at 14:49
1

Given this code:

typedef Toffset uint; //byte offset;
typedef Ttype   uint; //enum of types
typedef std::pair<Toffset,Ttype> member;
typedef std::unordered_map<std::string, member> memberdefs;

memberdefs itemKey; 
itemKey["a"] = member(0, 0);
itemKey["b"] = member(4, 1);
itemKey["c"] = member(8, 2);
itemKey["d"] = member(12,1);
itemKey["e"] = member(16,3);
itemKey["f"] = member(17,2);

You could read into a char* buffer, and use a simple wrapper class. Still bug-prone and highly confusing. This demo has no iterator (though that would be simple), and requires an external buffer to stay in scope at least as long as the class does.

class interleaved_vector {
    const char* buffer;
    size_t count;
    size_t size;
    std::shared_ptr<memberdefs> members;
public: 
    class dynamic_object {
        const char* buffer;
        std::shared_ptr<memberdefs> members;
        friend interleaved_vector;
        dynamic_object(const char* buffer_, std::shared_ptr<memberdefs> members_)
        :buffer(buffer_), members(members_)
        {}
        dynamic_object& operator=(const dynamic_object& b) = delete;
    public:
        dynamic_object(const dynamic_object& b) 
        :buffer(b.buffer), members(b.members)
        {}
        template <class T>
        T get(const std::string& member) const {
            assert((*members)[member].second > 0); //no new members, requires pos sizes
            assert((*members)[member].second == sizeof(T));
            return *reinterpret_cast<T*>(buffer+(*members)[member].first); //technically undefined I think
        };
        template <>
        T* get<T*>(const std::string& member) const {
            assert((*members)[member].second > 0); //no new members, requires pos sizes
            assert((*members)[member].second == sizeof(T));
            return reinterpret_cast<T*>(buffer+(*members)[member].first); //technically undefined I think
        };
        void* operator[](const std::string& member) const {
            assert((*members)[member].second > 0); //no new members, requires pos sizes
            assert((*members)[member].second == sizeof(T));
            return reinterpret_cast<void*>(buffer+(*members)[member].first); //technically undefined I think
        };
    };
    interleaved_vector(const char* buffer_, size_t count_, size_t size_, const memberdefs& members_)
    :buffer(buffer_), count(count_), size(size_), members(members_) 
    {}
    dynamic_object get(size_t index) const { 
        assert(index<count);
        return dynamic_object(buffer + size*index, members);
    }
    dynamic_object operator[](size_t index) const { 
        assert(index<count);
        return dynamic_object(buffer + size*index, members);
    }
    size_t size() {
        return count;
    }
};

This would allow code such as:

size_t element_size = 32;
size_t num_elements = 1000000
char * buffer = new char[num_elements*element_size];
/*read into buffer*/
interleaved_vector iv(buffer, num_elements, element_size , members);
/*interleaved_vector DOES NOT COPY BUFFER. BUFFER MUST REMAIN IN SCOPE*/
for(int i=0; i<iv.size(); ++i) {
    for(memberdefs::const_iterator j=members.begin(); j!=members.end(); ++j) {
        dynamic_object ob = iv[i];
        std::cout << "data[" << i << "]." << j.first << " = ";
        std::cout << ob.get<int>(j.first) << '\n';
    }
}

This demo code assumes all members are ints (get) but hopefully you can see what's intended.

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
  • 1
    I should also note, again, this is a bad idea. Do the vector of pointers. – Mooing Duck Aug 31 '11 at 17:39
  • If I do a vector of pointers, how do I avoid the overhead of retrieving one of the items inside the object for every instance in the array? I can't just increment an iterator. – Ben L Aug 31 '11 at 19:26
  • Actually, in hindsight, you'd need a function like my getMember() function even if you did a vector of pointers. I think this answer might actually be the best idea, if it is wrapped in a class. I'll edit it to wrap it up. – Mooing Duck Aug 31 '11 at 19:49
0

You could write your own class, but it'd be a serious pain. Better just go with vector (or boost::ptr_vector), which takes no effort on your part, and is easily readable to every programmer who comes along.

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158