6

I have the following questions regarding BLOBs in sqlite:

  • Does sqlite keep track of sizes of BLOBs?
  • I'm guessing that it does, but then, does the length function use it, or does it read the BLOB's content?
  • If sqlite keeps track of the size of the BLOB and length doesn't use it, is the size accessible via some other functionality?

I'm asking this because I'm wondering if I should implement triggers that set BLOBs' sizes in additional columns, of if I can obtain the sizes dynamically without the performance hit of sqlite reading the BLOBs.

  • I had a similar problem (needed to test whether a blob was not null), and directly testing with `column_name is not null` did take a lot longer than length `length(column_name) is not null`. So clearly, `length()` does not read the blob's content. (And IMHO `is not null` should do the same, or at least test the length before.) – waldyrious Apr 20 '17 at 11:21

3 Answers3

5

From the source:

** In an SQLite index record, the serial type is stored directly before
** the blob of data that it corresponds to. In a table record, all serial
** types are stored at the start of the record, and the blobs of data at
** the end. Hence these functions allow the caller to handle the
** serial-type and data blob seperately.
**
** The following table describes the various storage classes for data:
**
**   serial type        bytes of data      type
**   --------------     ---------------    ---------------
**      0                     0            NULL
**      1                     1            signed integer
**      2                     2            signed integer
**      3                     3            signed integer
**      4                     4            signed integer
**      5                     6            signed integer
**      6                     8            signed integer
**      7                     8            IEEE float
**      8                     0            Integer constant 0
**      9                     0            Integer constant 1
**     10,11                               reserved for expansion
**    N>=12 and even       (N-12)/2        BLOB
**    N>=13 and odd        (N-13)/2        text

In other words, the blob size is in the serial, and it's length is simply "(serial_type-12)/2".
This serial is stored before the actual blob, so you don't need to read the blob to get its size.
Call sqlite3_blob_open and then sqlite3_blob_bytes to get this value.

OliJG
  • 2,650
  • 1
  • 16
  • 15
  • Can I call these 2 functions using SQL? –  Aug 10 '11 at 08:18
  • Depends on the version of SQLite3 you're using. See: http://stackoverflow.com/questions/250940/how-do-i-find-the-length-size-of-a-binary-blob-in-sqlite – OliJG Aug 11 '11 at 04:07
  • AFAICT that link points to how to use the length function... unless I'm missing something... –  Aug 13 '11 at 07:45
  • That link says that 3.7.6 or later correctly calculates blob length using "length()" but that at least 3.4.2 does not. If you're using SQLite 3.7.6 or later, it shouldn't be a problem to use the "length()" function. – OliJG Aug 14 '11 at 08:18
  • 1
    arrite, then, following-up on your answer, using the length function doesn't cause the whole BLOB to be read? –  Aug 14 '11 at 08:49
  • 1
    That's correct. In sources I've studied, the 'length()' function will calculate as in my answer. – OliJG Aug 15 '11 at 06:17
0

Write a 1byte and a 10GB blob in a test database. If length() takes the same time for both blobs, the blob's length is probably accessed. Otherwise the blob is probably read.

OR: download the source code and debug through it: http://www.sqlite.org/download.html. These are some relevant bits:

/*
** Implementation of the length() function
*/
static void lengthFunc(
  sqlite3_context *context,
  int argc,
  sqlite3_value **argv
){
  int len;

  assert( argc==1 );
  UNUSED_PARAMETER(argc);
  switch( sqlite3_value_type(argv[0]) ){
    case SQLITE_BLOB:
    case SQLITE_INTEGER:
    case SQLITE_FLOAT: {
      sqlite3_result_int(context, sqlite3_value_bytes(argv[0]));
      break;
    }
    case SQLITE_TEXT: {
      const unsigned char *z = sqlite3_value_text(argv[0]);
      if( z==0 ) return;
      len = 0;
      while( *z ){
        len++;
        SQLITE_SKIP_UTF8(z);
      }
      sqlite3_result_int(context, len);
      break;
    }
    default: {
      sqlite3_result_null(context);
      break;
    }
  }
}

and then

/*
** Return the number of bytes in the sqlite3_value object assuming
** that it uses the encoding "enc"
*/
SQLITE_PRIVATE int sqlite3ValueBytes(sqlite3_value *pVal, u8 enc){
  Mem *p = (Mem*)pVal;
  if( (p->flags & MEM_Blob)!=0 || sqlite3ValueText(pVal, enc) ){
    if( p->flags & MEM_Zero ){
      return p->n + p->u.nZero;
    }else{
      return p->n;
    }
  }
  return 0;
}

You can see that the length of text data is calculated on the fly. That of blobs... well, I'm not fluent enough in C... :-)

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • Hehe yes I can always read the code, but then I'm not sure why I'd bother with asking questions on SO... OTOH performance testing could provide useful info if done right, but that is not very deterministic: in other words I'm hoping that someone with specific knowledge about sqlite BLOBs can enlight me (and the community) about this. –  Aug 08 '11 at 15:08
  • I guess I"m not fluent enough in C myself, but can you please explain how you are able to conclude that a BLOB's length is calculated on the fly from the code you posted? –  Aug 08 '11 at 15:16
  • I concluded that text data (not blob data) length is calculated on the fly. Because of the loop over all characters in `case SQLITE_TEXT:`. In `C` a string is zero-terminated, so the `while` will abort when the zero is reached. For blobs, I suspect that `p->n` is accessed only once, without loading the whole blob, but that might not be correct. – Lukas Eder Aug 08 '11 at 15:19
  • `SQLITE_BLOB` calls upon the second method (I omitted the indirection via `sqlite3_value_bytes`, which in turn calls `sqlite3ValueBytes`) – Lukas Eder Aug 08 '11 at 15:21
  • Yes, for SQLITE_TEXT, there is a loop over the string, now with SQLITE_BLOB (and what seems to be other numeric types) there is a call to sqlite3_value_bytes, which I'm assuming eventually calls sqlite3ValueBytes, but then again, I'm not sure how that translate into reading the whole BLOB... –  Aug 08 '11 at 15:25
  • I searched through the code and found that sqlite3_value is a typedef of `struct Mem` which contains `int n;` which holds the length of the BLOB (me thinks). Now I guess I need to find how that member `int n` is populated. –  Aug 08 '11 at 15:46
  • Yes. That and, whether `**argv` contains an already-populated `*z`, which holds the character data in `Mem` – Lukas Eder Aug 08 '11 at 15:52
0

If you have access to the raw c api sqlite3_blob_bytes will do the job for you. If not please provide additional information.

Mythli
  • 5,995
  • 2
  • 24
  • 31