2

I was trying to parse the header from an SQLite database file, using this (fragment of the actual) code:

struct Header_info {
    char *filename;
    char *sql_string;
    uint16_t page_size;
};

int read_header(FILE *db, struct Header_info *header)
{
    assert(db);
    uint8_t sql_buf[100] = {0};

    /* load the header */
    if(fread(sql_buf, 100, 1, db) != 1) {
        return ERR_SIZE;
    }

    /* copy the string */
    header->sql_string = strdup((char *)sql_buf);

    /* verify that we have a proper header */
    if(strcmp(header->sql_string, "SQLite format 3") != 0) {
        return ERR_NOT_HEADER;
    }

    memcpy(&header->page_size, (sql_buf + 16), 2);

    return 0;
}

Here are the relevant bytes of the file I'm testing it on:

0000000: 5351 4c69 7465 2066 6f72 6d61 7420 3300  SQLite format 3.
0000010: 1000 0101 0040 2020 0000 c698 0000 1a8e  .....@  ........

Following this spec, the code looks correct to me.

Later I print header->page_size with this line:

printf("\tPage size: %"PRIu16"\n", header->page_size);

But that line prints out 16, instead of the expected 4096. Why? I'm almost certain it's some basic thing that I've just overlooked.

charmlessCoin
  • 754
  • 3
  • 13

2 Answers2

2

It's an endianness problem. x86 is little-endian, that is, in memory, the least significant byte is stored first. When you load 10 00 into memory on a little-endian architecture, you therefore get 00 10 in human-readable form, which is 16 instead of 4096.

Your problem is therefore that memcpy is not an appropriate tool to read the value.

See the following section of the SQLite file format spec :

1.2.2 Page Size

The two-byte value beginning at offset 16 determines the page size of the database. For SQLite versions 3.7.0.1 and earlier, this value is interpreted as a big-endian integer and must be a power of two between 512 and 32768, inclusive. Beginning with SQLite version 3.7.1, a page size of 65536 bytes is supported. The value 65536 will not fit in a two-byte integer, so to specify a 65536-byte page size, the value is at offset 16 is 0x00 0x01. This value can be interpreted as a big-endian 1 and thought of is as a magic number to represent the 65536 page size. Or one can view the two-byte field as a little endian number and say that it represents the page size divided by 256. These two interpretations of the page-size field are equivalent.

us2012
  • 16,083
  • 3
  • 46
  • 62
  • Then what should I use to convert the values to big-endian format? – charmlessCoin Sep 07 '13 at 23:56
  • 2
    [This question](http://stackoverflow.com/questions/105252/how-do-i-convert-between-big-endian-and-little-endian-values-in-c) is tagged C++ but has answers suitable to C as well. Also remember to treat the special case mentioned in the spec (`00 01` for 65536) and be extra careful if your code might run on big endian machines at some point (in that case you wouldn't want to swap bytes). – us2012 Sep 07 '13 at 23:59
2

It seems an endianness issue. If you are on a little-endian machine this line:

memcpy(&header->page_size, (sql_buf + 16), 2);

copies the two bytes 10 00 into an uint16_t which will have the low-order byte at the lower address.

You can do this instead:

header->page_size = sql_buf[17] | (sql_buf[16] << 8);

Update

For the record, note that the solution I propose will work regardless of the endianness of the machine (see this Rob Pike's Article).