Whether to use NULL Terminated strings (SZ) or Length Prefixed strings (LPS) seems to be a hot button topic. Indeed, we even have a question on that topic here.
So something occurred to me. Could we use a bit of both? I mean, obviously, you can't make a LPS & SZ (LPSZ?) string without removing some of the great things about one or the other.
However, the biggest complaint about SZ seems to be the time required for operations, due to string length measurements taking so long. I had an idea about that:
Make all strings SZ. However, we could also store the length of the string mod 256 (ie: len % 256
) in a single byte preceding the string. While it would not reduce the complexity of the operation, it could potentially increase the speed substantially, at the cost of a single extra byte.
Here's what the main advantages of this scheme would be:
- Not restricted to any particular size
- Faster than normal SZ (upto 256x)
- Compatible across all different memory sizes
- Small strings are still quite space efficient (no wasted bytes in
len
) - No endianness issues
- Portable across machines (see 3 & 5)
This is what strlen ()
would look like under this scheme (obviously, you would name it something else, because you would be making a different library):
size_t ppsz_strlen (const char *s) {
// get LPS
size_t = ((uint8_t *) s)[-1];
// check for SZ across every 256 byte interval
for (x = x; s[x]; x += 256)
continue;
return x;
}
A suitable name might be PPSZ (Partially Prefixed Null Terminated Strings).
To me, this seems to be a reasonable tradeoff: one byte for a fairly large acceleration. Of course, someone might ask: why not two, or four, or eight bytes? My answer would be that most strings in programs don't get too big, where skipping ahead by 65536, 16777216, 2 ** 32
, or 2 ** 64
becomes too valuable. With a few of those cases, it might actually be a good time to consider splitting up strings. Especially if one string is overflowing beyond the size of a 64-bit addressing space.
Anyways though, I was wondering if anyone else had some ideas regarding the concept. I'm positive there's probably something I'm missing out on as to why I haven't seen the concept in practice before.
Thanks for any suggestions or problems found!