6

I'm working on a program that stores a vital data structure as an unstructured string with program-defined delimiters (so we need to walk the string and extract the information we need as we go) and we'd like to convert it to a more structured data type.

In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself. The length of the string will always be known at allocation time. We've determined through testing that doubling the number of allocations required for each of these data types is an unnacceptable cost. Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation? If we were using cstrings I'd just have a char * in the struct and point it to the end of the struct after allocating a block big enough for the struct and string, but we'd prefer std::string if possible.

Most of my experience is with C, so please forgive any C++ ignorance displayed here.

Shea Levy
  • 5,237
  • 3
  • 31
  • 42
  • If I understand it correctly, the string cannot grow after construction, and memory would be managed by whoever manages the whole object... since you are going the C way, why not just use a `char*`? The worst parts of C strings is the need to manage the memory, but it seems that in your case that is not a problem, is it? – David Rodríguez - dribeas Jun 08 '12 at 12:26
  • 3
    +1 for having **proved** the need for optimization before going off all half-cocked! – John Dibling Jun 08 '12 at 12:37
  • If you're *really* keen to use `std::string` rather than c-string, you could look at using a custom allocator to acheive some of what you want to do. That could allow you to pool your strings in a single allocation, but it still won't get everything (i.e. your struct) allocated in one go and it would probably end up being a maintenance nightmare. Probably best to stick to c-strings. – Component 10 Jun 08 '12 at 13:11

8 Answers8

2

If you have such rigorous memory needs, then you're going to have to abandon std::string.

The best alternative is to find or write an implementation of basic_string_ref (a proposal for the next C++ standard library), which is really just a char* coupled with a size. But it has all of the (non-mutating) functions of std::basic_string. Then you use a factory function to allocate the memory you need (your struct size + string data), and then use placement new to initialize the basic_string_ref.

Of course, you'll also need a custom deletion function, since you can't just pass the pointer to "delete".


Given the previously linked to implementation of basic_string_ref (and its associated typedefs, string_ref), here's a factory constructor/destructor, for some type T that needs to have a string on it:

template<typename T> T *Create(..., const char *theString, size_t lenstr)
{
  char *memory = new char[sizeof(T) + lenstr + 1];
  memcpy(memory + sizeof(T), theString, lenstr);

  try
  {
    return new(memory) T(..., string_ref(theString, lenstr);
  }
  catch(...)
  {
    delete[] memory;
    throw;
  }
}

template<typename T> T *Create(..., const std::string & theString)
{
  return Create(..., theString.c_str(), theString.length());
}

template<typename T> T *Create(..., const string_ref &theString)
{
  return Create(..., theString.data(), theString.length());
}

template<typename T> void Destroy(T *pValue)
{
  pValue->~T();

  char *memory = reinterpret_cast<char*>(pValue);
  delete[] memory;
}

Obviously, you'll need to fill in the other constructor parameters yourself. And your type's constructor will need to take a string_ref that refers to the string.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
1

If you are using std::string, you can't really do one allocation for both structure and string, and you also can't make the allocation of both to be one large block. If you are using old C-style strings it's possible though.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
1

If I understand you correctly, you are saying that through profiling you have determined that the fact that you have to allocate a string and another data member in your data structure imposes an unacceptable cost to you application.

If that's indeed the case I can think of a couple solutions.

  1. You could pre-allocate all of these structures up front, before your program starts. Keep them in some kind of fixed collection so they aren't copy-constructed, and reserve enough buffer in your strings to hold your data.
  2. Controversial as it may seem, you could use old C-style char arrays. It seems like you are fogoing much of the reason to use strings in the first place, which is the memory management. However in your case, since you know the needed buffer sizes at start up, you could handle this yourself. If you like the other facilities that string provides, bear in mind that much of that is still available in the <algorithm>s.
John Dibling
  • 99,718
  • 31
  • 186
  • 324
1

Take a look at Variable Sized Struct C++ - the short answer is that there's no way to do it in vanilla C++.

Do you really need to allocate the container structs on the heap? It might be more efficient to have those on the stack, so they don't need to be allocated at all.

Community
  • 1
  • 1
ecatmur
  • 152,476
  • 27
  • 293
  • 366
1

I'm not sure if this exactly addressing your problem. One way you can optimize the memory allocation in C++ by using a pre-allocated buffer and then using a 'placement new' operator. I tried to solve your problem as I understood it.

 unsigned char *myPool = new unsigned char[10000];
 struct myStruct
 {
    myStruct(char* aSource1, char* aSource2)
    {
        original = new (myPool) string(aSource1); //placement new
        data = new (myPool) string(aSource2); //placement new
    }
    ~myStruct()
    {
        original = NULL; //no deallocation needed
        data = NULL; //no deallocation needed
    }
    string* original;
    string* data;
};

int main()
{
    myStruct* aStruct = new (myPool) myStruct("h1", "h2");

    //  Use the struct

    aStruct = NULL; //  No need to deallocate
    delete [] myPool;

    return 0;
}

[Edit] After, the comment from NicolBolas, the problem is bit more clear. I decided to write one more answer, eventhough in reality it is not that much advantageous than using a raw character array. But, I still believe that this is well within the stated constraints. Idea would be to provide a custom allocater for the string class as specified in this SO question. In the implementation of the allocate method, use the placement new as

pointer allocate(size_type n, void * = 0) 
{
    // fail if we try to allocate too much
    if((n * sizeof(T))> max_size()) { throw std::bad_alloc(); }

    //T* t = static_cast<T *>(::operator new(n * sizeof(T)));
    T* t = new (/* provide the address of the original character buffer*/) T[n];
    return t;
}

The constraint is that for the placement new to work, the original string address should be known to the allocater at run time. This can be achieved by external explicit setting before the new string member creation. However, this is not so elegant.

Community
  • 1
  • 1
PermanentGuest
  • 5,213
  • 2
  • 27
  • 36
  • That's not going to help. That only decides where the `string` is allocated, not the `string`'s *contents*. – Nicol Bolas Jun 08 '12 at 13:01
  • Question is "Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation?". And reason for the question is " doubling the number of allocations required for each of these data types is an unnacceptable cost.". I assume that the performance is bad due to the memory allocation. Here, the allocation is avoided (for both the string and the structure). (I agree that if the string manipulation is reducing the performance, then this is not good enough. But, that is not clear from the question) – PermanentGuest Jun 08 '12 at 13:07
  • 2
    No it isn't. `std::string` allocates memory *internally*. **That** is the second memory allocation he wants to eliminate. – Nicol Bolas Jun 08 '12 at 13:11
  • @NicolBolas : I get what you are saying, I think I should modify my answer – PermanentGuest Jun 08 '12 at 15:49
1

C-style strings can always be converted to std::string as needed. In fact, there's a good chance that your observations from profiling are due to fragmentation of your data rather than simply the number of allocations, and creating an std::string on demand will be efficient. Of course, not knowing your actual application this is just a guess, and really one can't know this until it's tested anyways. I imagine a class

class my_class {
    std::string data() const { return self._data; }
    const char* data_as_c_str() const // In case you really need it!
    { return self._data; }
private:
    int _type;
    char _data[1];
};

Note I used a standard clever C trick for data layout: _data is as long as you want it to be, so long as your factory function allocates the extra space for it. IIRC, C99 even gave a special syntax for it:

struct my_struct {
    int type;
    char data[];
};

which has good odds of working with your C++ compiler. (Is this in the C++11 standard?)

Of course, if you do do this, you really need to make all of the constructors private and friend your factory function, to ensure that the factory function is the only way to actually instantiate my_class -- it would be broken without the extra memory for the array. You'll definitely need to make operator= private too, or otherwise implement it carefully.


Rethinking your data types is probably a good idea.

For example, one thing you can do is, rather than trying to put your char arrays into a structured data type, use a smart reference instead. A class that looks like

class structured_data_reference {
public:
    structured_data_reference(const char *data):_data(data) {}
    std::string get_first_field() const {
        // Do something interesting with _data to get the first field
    }
private:
    const char *_data;
};

You'll want to do the right thing with the other constructors and assignment operator too (probably disable assignment, and implement something reasonable for move and copy). And you may want reference counted pointers (e.g. std::shared_ptr) throughout your code rather than bare pointers.


Another hack that's possible is to just use std::string, but store the type information in the first entry (or first several). This requires accounting for that whenever you access the data, of course.

  • "creating an std::string on demand will be efficient." How? Each time you do it, you need another block of memory. – Nicol Bolas Jun 08 '12 at 18:01
  • It really depends on the application. As I mentioned, the loss in performance might not be because `std::string` does allocations, but because it fragments your data. Creating short-lived `std::string` temporaries, on the other hand, wouldn't fragment your data, and with any luck would be quick to allocate, and be allocated within cache. You could even avoid allocation by using a method like `structured_data_reference::copy_into_string(std::string &x)` that would replace the contents of `x` with a copy of the data in `_data`. –  Jun 09 '12 at 02:30
  • To explain further, the `std::string` in your structure solution has two problems. The first is that it does 2 allocations (maybe 3) as you mentioned. The second is that your structure and the contents of the string could be located in very different parts of memory. When the latter happens, every time you use the structure, you have to touch two different parts of memory. Since the structure itself is tiny, this is both cache and TLB inefficient if there isn't any other immediately useful data nearby the structure in memory, and in some applications, *this* is the dominant performance penalty –  Jun 09 '12 at 02:35
1

Indeed two allocations can seem too high. There are two ways to cut them down though:

  • Do a single allocation
  • Do a single dynamic allocation

It might not seem so different, so let me explain.

1. You can use the struct hack in C++

  • Yes this is not typical C++
  • Yes this requires special care

Technically it requires:

  • disabling the copy constructor and assignment operator
  • making the constructor and destructor private and provide factory methods for allocating and deallocating the object

Honestly, this is the hard-way.

2. You can avoid allocating the outer struct dynamically

Simple enough:

struct M {
    Kind _kind;
    std::string _data;
};

and then pass instances of M on the stack. Move operations should guarantee that the std::string is not copied (you can always disable copy to make sure of it).

This solution is much simpler. The only (slight) drawback is in memory locality... but on the other hand the top of the stack is already in the CPU cache anyway.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
-1

In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself.

I have a feeling that may you are not exploiting C++'s type-system to its maximum potential here. It looks and feels very C-ish (that is not a proper word, I know). I don't have concrete examples to post here since I don't have any idea about the problem you are trying to solve.

Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation?

I believe that you are worrying about the structure allocation followed by a copy of the string to the structure member? This ideally shouldn't happen (but of course, this depends on how and when you are initializng the members). C++11 supports move construction. This should take care of any extra string copies that you are worried about.

You should really, really post some code to make this discussion worthwhile :)

a vital data structure as an unstructured string with program-defined delimiters

One question: Is this string mutable? If not, you can use a slightly different data-structure. Don't store copies of parts of this vital data structure but rather indices/iterators to this string which point to the delimiters.

 // assume that !, [, ], $, % etc. are your program defined delims
 const std::string vital = "!id[thisisdata]$[moredata]%[controlblock]%";

 // define a special struct
 enum Type { ... }; 
 struct Info {
     size_t start, end;
     Type type;
     // define appropriate ctors
 };

 // parse the string and return Info obejcts
 std::vector<Info> parse(const std::string& str) {
      std::vector<Info> v;
      // loop through the string looking for delims
      for (size_t b = 0, e = str.size(); b < e; ++b) {
            // on hitting one such delim create an Info
            switch( str[ b ] ) {
                case '%':
                  ... 
                case '$;:    
                // initializing the start and then move until
                // you get the appropriate end delim
            }
            // use push_back/emplace_back to insert this newly
            // created Info object back in the vector
            v.push_back( Info( start, end, kind ) );
      }
      return v;
 }
dirkgently
  • 108,024
  • 16
  • 131
  • 187
  • -1: "I believe that you are worrying about the structure allocation followed by a copy of the string to the structure member? " He's talking about the allocation *within* the `std::string`. – Nicol Bolas Jun 08 '12 at 13:29
  • @NicolBolas: I still don't get where he says within the string. I'm fairly certain that is not the case since the strings are fixed-width, unless you and I have a different understanding of *within the string*. – dirkgently Jun 08 '12 at 15:01
  • He's talking about two memory allocations. Obviously this is referring to the construction of the type that contains the `std::string`, and the string contents of the `std::string`. – Nicol Bolas Jun 08 '12 at 16:32
  • Isn't that exactly what I was referring to i.e. structure allocation (the first) and copy of the string as in original to structure member (the second and also about the *within the string*)? – dirkgently Jun 08 '12 at 16:38
  • You can't move from a `char*`; you can only move from another `std::string`, which he almost certainly does not have, because that would be paying for the allocation that he doesn't want to pay for. The fact that you can move into a `std::string` is *irrelevant* to the problem he's trying to solve. He wants one allocation, not two. – Nicol Bolas Jun 08 '12 at 18:00
  • I mentioned move ctors because the created structure/string members will probably need to be passed around (and temporaries may thus be introduced). Which is why I said *extra string copies*. – dirkgently Jun 08 '12 at 20:40