As others mentioned, using any byte that works in your situation will work just fine in an std::string
. Although if your strings do not otherwise use '\0'
, it may be cleaner to use such rather than an illegal UTF-8 byte.
If your implementation is satisfactory in terms of speed, then I would imagine that's that. Otherwise, you could look into how databases are being managed. In that case you'd use buffers of a fixed size. The big advantage is that you would not break the memory in many small chunks and run in memory allocation problems later. Also speed wise, you would allocate those blocks once and re-use them many times. The malloc()
and free()
functions are expensive, especially if you have a tons of objects (new
and delete
operators call those functions.)
Now to save even more memory, since it sounds that is the main goal, and if possible in your situation, you could consider compressing your strings with zlib. I would use the fastest compression mode and see whether the resulting buffer is smaller, if yes, use it. Otherwise keep the uncompressed string. This requires you to save a size (4 bytes) per string. You can set the size to 0 when the buffer is not compressed.
One other things I'd like to mention is the fact that using an illegal byte will possibly be confusing to a future programmer maintaining that code base. No matter how many comments you have there, they will probably not read them anyway... you know... programmers just tend to read the code, not so much the comments. If it is something you are worried about, you could save your concatenated strings in a vector instead. Your split function would take a vector of char as input and return a vector of strings as its results.
Another possibility is to make use of swap memory through mmap()
. This can be tedious, though, when handling dynamic data. This is where a database like scheme helps very much. You would allocate blocks (i.e. 64Kb at a time) and manage your data on a per block basis. When a string grows too big for the current block, move it to a new block... The advantage of this technique is that the data remains in memory unless the OS decides that it needs some of the RAM your software is using and it can swap it out an any time. To you, that swapping will be totally transparent. It also makes it much faster than hitting the default swap which has to manage your memory in a much less efficient manner.