Why does this implementation of strlen() work?

Question

_{(Disclaimer: I've seen this question, and I am not re-asking it -- I am interested in why the code works, and not in how it works.)}

So here's this implementation of Apple's (well, FreeBSD's) strlen(). It uses a well-known optimization trick, namely it checks 4 or 8 bytes at once, instead of doing a byte-by-byte comparison to 0:

size_t strlen(const char *str)
{
    const char *p;
    const unsigned long *lp;

    /* Skip the first few bytes until we have an aligned p */
    for (p = str; (uintptr_t)p & LONGPTR_MASK; p++)
        if (*p == '\0')
            return (p - str);

    /* Scan the rest of the string using word sized operation */
    for (lp = (const unsigned long *)p; ; lp++)
        if ((*lp - mask01) & mask80) {
        p = (const char *)(lp);
        testbyte(0);
        testbyte(1);
        testbyte(2);
        testbyte(3);
#if (LONG_BIT >= 64)
        testbyte(4);
        testbyte(5);
        testbyte(6);
        testbyte(7);
#endif
    }

    /* NOTREACHED */
    return (0);
}

Now my question is: maybe I'm missing the obvious, but can't this read past the end of a string? What if we have a string of which the length is not divisible by the word size? Imagine the following scenario:

|<---------------- all your memories are belong to us --------------->|<-- not our memory -->
+-------------+-------------+-------------+-------------+-------------+ - -
|     'A'     |     'B'     |     'C'     |     'D'     |      0      |
+-------------+-------------+-------------+-------------+-------------+ - -
^                                                      ^^
|                                                      ||
+------------------------------------------------------++-------------- - -
                       long word #1                      long word #2

When the second long word is read, the program accesses bytes that it shouldn't in fact be accessing... isn't this wrong? I'm pretty confident that Apple and the BSD folks know what they are doing, so could someone please explain why this is correct?

One thing I've noticed is that beerboy asserted this to be undefined behavior, and I also believe it indeed is, but he was told that it isn't, because "we align to word size with the initial for loop" (not shown here). However, I don't see at all why alignment would be any relevant if the array is not long enough and we are reading past its end.

Note: Aside from the potential UB well answered, the method's ["5.2 times as fast"](http://www.opensource.apple.com/source/Libc/Libc-997.1.1/string/FreeBSD/strlen.c) by taking advantage that `char` is typically ASCII (codes 0 - 127) is itself noteworthy. — chux - Reinstate Monica, Dec 01 '13 at 22:13
As a general principle the code that appears in a standard library implementation *doesn't exist in a standard C environment*. Sometimes this is to the "benefit" of the code -- it can rely on implementation-specific guarantees that the standard doesn't offer, as in this case. Sometimes it's to the "detriment" of the code -- for example there might be an obvious implementation of X in terms of Y that you can't do because you already chose the obvious implementation of Y in terms of X. — Steve Jessop, Aug 19 '14 at 22:42
For x86, duplicate of [Is it safe to read past the end of a buffer within the same page on x86 and x64?](https://stackoverflow.com/q/37800739) — Peter Cordes, Sep 30 '20 at 11:51

score 23 · Accepted Answer · answered Dec 01 '13 at 13:04

23

Although this is technically undefined behavior, in practice no native architecture checks for out-of-bounds memory access at a finer granularity than the size of a word. So while garbage past the terminator may end up being read, the result will not be a crash.

answered Dec 01 '13 at 13:04

Sneftel

40,271
12
71
104

2

I gues it might crash if it happens to be at the boundary of a guard page? – Devolus Dec 01 '13 at 13:07
4

No, because every word which is read is guaranteed to be at least partially in the array, and hence not trapped. – Sneftel Dec 01 '13 at 13:09
There is a school of thinking that asserts that *any* operation merely touching random bytes leads into an undefined state. However, the mere act of reading data is *not* UB -- and the random data that gets accessed "after" the end of the string gets never processed. – Jongware Dec 01 '13 at 13:17
5

@Jongware In fact, reading uninitialized variables, for example, is in itself undefined behavior, so is accessing an array out of bounds. – Dec 01 '13 at 14:54
@H2CO3: I don't think it is. Consider `if (x != 0 && y/x)` -- dividing by zero is bad but it is never *done*. Reading a memory-wise valid *but* undefined byte and never *doing* anything with it sounds okay to me. I agree this philosophically touches upon the sound of a falling tree. Maybe we should call it "quantum behavior". – Jongware Dec 01 '13 at 15:00
5

@Jongware What "sounds OK" to you needn't agree with what's in the Standard. And the Standard says it's UB, so it's UB, period. – Dec 01 '13 at 15:10
1

@H2CO3: that's rather worrisome, I use `strlen` all the time! What should I use instead? – Jongware Dec 01 '13 at 15:13
1

@Devolus: notice that the code starts reading byte by byte until it reaches a word-aligned address. Then, only words containing at least some part of the array are read. Though, it's easy to blunder and read one word too much: http://stackoverflow.com/questions/18524508/microsofts-strncat-reads-bytes-beyond-source-buffer-boundaries#comment27543898_18524508 – ninjalj Dec 01 '13 at 15:14
10

@Jongware `strlen()`. I don't doubt it works on actual, specific implementations, *despite* being UB, but that is only the case because the writers of the standard library have intimate knowledge of the platform and/or the compiler they are using. In a sense, it can be OK for library/other implementation code to rely on certain aspects of UB, but it is never OK for user code to do so. – Dec 01 '13 at 15:16
10

@Jongware: the implementation has additional guarantess that the standard doesn't mandate, and abuses them in their _internal_ implementation of strlen(). You don't care how strlen() is implemented internally, you use the guarantees the standard promises you abot strlen(). – ninjalj Dec 01 '13 at 15:17
@H2CO3: can we agree, then, that in the case under discussion -- this implementation of `strlen` -- UB is *not* triggered because it relies on the fact that any bytes in the UB DangerZone are never tested? – Jongware Dec 01 '13 at 15:21
@Jongware I think yes... or, more precisely, one can tell this if one considers the actual, low-level implementation details of the actual OS, compiler, etc., and it's still UB according to the standard, but here the undefined-ness, thanks to the implementation, results in correct working. – Dec 01 '13 at 15:24
7

@Jongware: it _is_ UB: you _cannot_ take that code and move it to an arbitrary architecture, most notably to a !MMU (non-paged based) architecture, where a string may end at the end of the data segment which may have a byte-granularity limit set at the MPU. – ninjalj Dec 01 '13 at 15:25
@ninjalj: I get your point, but this is only FreeBSD's version of strlen. I suppose any architectures or compilers where this problem *might* happen supply their own, adjusted, libraries. (And regularly have to explain "no, this one trick will not work on our system".) – Jongware Dec 01 '13 at 15:34
3

That's why FreeBSD developers use very specific compilers to build FreeBSD: so that all "UB"-parts actually has (implementation-)defined behaviour they can rely on. – Joker_vD Dec 01 '13 at 17:05

Jongware · Answer 2 · 2013-12-01T15:40:01.187

I don't see at all why alignment would be any relevant if the array is not long enough and we are reading past its end.

The routine starts with aligning to a word boundary for two reasons: first, reading words from an aligned address is faster on most architectures (and it's also mandatory on a few CPUs). The speed increase is enough to use the same trick in a host of similar operations: memcpy, strcpy, memmove, memchr, etc.

Second: if you continue reading words starting at a word boundary, you are assured the rest of the string resides in the same memory page. A string (including its terminating zero) cannot straddle a memory page boundary, and neither can reading a word. (1)

So this is always fastest and safest, even if the memory page granularity is sizeof(LONG_BIT) (which it isn't).

Picking up an entire word near the end of a string may pick up additional bytes after the final zero, but reading Undefined Bytes from valid memory is not UB -- only acting upon its contents is (2). If the word contains a zero terminator anywhere inside, the individual bytes are inspected with test_byte, and this, as is shown in the original source, will never act on bytes after the terminator.

(1) Obviously they can, but I meant "never into a locked page" or something similar.

(2) Under Discussion. See (sorry about that!) under Sneftel's answer.

Why does this implementation of strlen() work?

2 Answers2

Linked