1

I'm using the C interface to SQLite and need to list the size of the a string stored in TEXT column, i.e. what some databases would support using

SELECT DATALENGTH(body) FROM table;

However, SQLite does not have the DATALENGTH function. Since the text in question is quite large, I want to avoid loading the actual body column from disk. Since the TEXT is utf8, LENGTH(body) does not return the desired result (it does return the character count, whereas I need to know the storage size). Converting the column to BLOB is not an option because the database is created by a third-party program.

Q: Is there any way to obtain this information directly from the database, short of writing a custom SQL function? The SQLite file format does store the length of TEXT fields, so I'd be surprised if this information really isn't exposed.

Related:

Wisperwind
  • 963
  • 7
  • 16

2 Answers2

3

Cast to a blob to get the size in bytes from length():

SELECT length(cast(body AS BLOB)) FROM table;
Shawn
  • 47,241
  • 3
  • 26
  • 60
  • Thanks for the pointer to `cast`, I've been browsing the docs a bit and it should do the trick! Looking at the sqlite3 source superficially suggests that no new memory will be allocated in this way. – Wisperwind Apr 07 '20 at 15:25
0

Alternatively to @Shawn's answer, it turns out that implementing a custom SQL function is actually fairly simple:

/* Implement the DATALENGTH() SQL function. Code adapted from the SQLite source
 * for the LENGTH() function.
 */
static void db_sqlite_datalength(sqlite3_context *context, int argc,
        sqlite3_value **argv)
{
    (void) argc;

    switch(sqlite3_value_type(argv[0])) {
    case SQLITE_BLOB:
    case SQLITE_INTEGER:
    case SQLITE_FLOAT:
    case SQLITE_TEXT:
        sqlite3_result_int(context, sqlite3_value_bytes(argv[0]));
        break;
    default:
        sqlite3_result_null(context);
        break;
    }
}

static int db_register_custom_functions(sqlite3 *db)
{
    return sqlite3_create_function(db, "datalength", 1,
            SQLITE_UTF8 | SQLITE_DETERMINISTIC, NULL,
            &db_sqlite_datalength, NULL, NULL);
}

Since the builtin length function is implemented in exactly the same way, performance should be alright. I browsed the sqlite3 source a bit, but it is not so easy to tell when exactly disk accesses happen. Presumably, sqlite3 mmaps everything such that the important thing for performance is that possible overflow pages for long TEXT fields are not accessed unnecessarily.

Wisperwind
  • 963
  • 7
  • 16