6

Are there any STL containers that seem to be well-suited for using as BLOBs for database software? I would think a vector<char>, but is there something better? Maybe a std::string? Or some non-STL container?

Baruch
  • 20,590
  • 28
  • 126
  • 201
  • Well, is it a text string? If not (as a BLOB is just a bunch of binary rubbish), then a `std::string` is a bad idea. In the same sense is vector of a `char`s (which are text characters) a bad idea, compared to a vector of `unsigned char`s. – Christian Rau May 20 '12 at 10:29
  • @ChristianRau: `char` is not a "text character". `string` is not a bad idea. – Kerrek SB May 20 '12 at 10:29
  • @KerrekSB Well, conceptually it is. And conceptually a string is a bad idea for non-text. Conceptually, of course. – Christian Rau May 20 '12 at 10:30
  • @ChristianRau: No. Conceptually, a `char` is the smallest addressable unit of data, and the basic unit of I/O. Thus it is in fact the perfect type to represent arbitrary data. The only thing wrong with `char` is its own name. – Kerrek SB May 20 '12 at 10:31
  • @KerrekSB Then is there no difference between `char` and `unsigned char` when not used for numbers? – Baruch May 20 '12 at 10:33
  • @KerrekSB Well, I would prefer a type with a definite specified (over multiple platforms) signedness, considering that binary data is often best manipulated as a sequence of integer numbers. Though you're right in that if you don't manipulate it (as it may be the case for a BLOB), an undefinite signedness may suffice. But I just have an objection against unspecified signedness and for ma a `char` is a character, may just be something subjective without reason. – Christian Rau May 20 '12 at 10:33
  • That's how BLOBS are handled in OTL - http://otl.sourceforge.net/otl3_lob_stream.htm – bobah May 20 '12 at 10:35
  • 3
    @baruch: `signed char` and `unsigned char` are arithmetic, integral types just like `int` and `unsigned int`. On the other hand, `char` is expressly intended to be the "I/O" type that represents some opaque, system-specific fundamental unit of data on your platform. I would use them in this spirit. – Kerrek SB May 20 '12 at 10:35
  • @KerrekSB But Ok, even if `char` is best used for binary data (which you seem to be right with), `std::string` definitely isn't, as it is conceptually a text-string. You don't want to use any locale-based comparisons and transformations for binary data, let aside NUL-termination. – Christian Rau May 20 '12 at 10:39
  • @bobah What is? chars, unsigned chars, vectors, strings? – Baruch May 20 '12 at 10:40
  • @baruch, that particular example was with stream, this is a link to the full examples list http://otl.sourceforge.net/otl3_examples.htm, I suggested looking at it because OTL is a single header, and if you find an example that matches your use case you can check the source code underneath and do the same. – bobah May 20 '12 at 12:16

3 Answers3

11

The BLOB type of databases allows storage of binary data, so you need an ordered collection of bytes. The easiest choice would be a vector<> and you could chose unsigned char to represent a byte on most platforms

Attila
  • 28,265
  • 3
  • 46
  • 55
2

We have used streams in one of our projects to represent BLOB/CLOB values stored in the database. I think this is most of the time the best approach, as BLOB/CLOBs could be really large to fit in memory by definition.

Write a stream implementation of your own and use it just like any other stream.

Hakan Serce
  • 11,198
  • 3
  • 29
  • 48
2

I'm currently using std::string to store blobs, since I'm using Google's Protocol Buffers library for object serialization, and that's what they use (e.g., MessageLite::SerializeToString). It works well for my purposes since inserting the resulting string as a blob into an SQLite database is very straightforward:

sqlite3_bind_blob(_insert_statement, 3, data.c_str(), data.size(), SQLITE_STATIC);

(data is a std::string being bound as the third argument to _insert_statement.)

  • How do you populate the string? – Peter Wood May 21 '12 at 06:55
  • Take a look at the [Protocol Buffers example](https://developers.google.com/protocol-buffers/docs/overview); instead of `person.SerializeToOstream(&output)` I use `person.SerializeAsString()` and then use the result as an SQLite blob. There's also `SerializeToString(std::string*)` if you really want to avoid copying. – Ian Mackenzie Jun 15 '12 at 12:42