3

After reading this, I have a similar question like this one, wondering how a memory allocator can work without violating the strict aliasing rules. But I am not wondering about re-using freed memory, I wonder about how allocated objects can be positioned within linear memory without violating strict aliasing.

All heap memory allocators I have looked at so far divide their memory in some sort of blocks, with a header in front. However, malloc returns a void * and usually points to the memory right after the header. Here is an extremely narrowed down example to illustrate this.

#include <stddef.h>

struct block_header {
  size_t size;
};

struct block_header *request_space(size_t size);

void *malloc(size_t size) {
    struct block_header *block = request_space(size);

    // I guess this violates strict aliasing, because the caller will 
    // convert the pointer to something other than struct block_header?
    // Or why wouldn't it?
    return block + 1;
}

I have been looking at this for a while now, but I see no way how an allocator could possibly position it's pointers in a memory region without violating strict aliasing. What am I missing?

Julius
  • 1,155
  • 9
  • 19
  • Isn't this one reason why the allocator provides memory aligned to, say 16 bytes? The violations concern C itself. – Weather Vane Apr 12 '20 at 20:14
  • 1
    The language implementation doesn't itself have to be portable. It can rely on implementation details. – Igor Tandetnik Apr 12 '20 at 20:32
  • 3
    The memory located at `block + 1` has not been used by malloc. So strict aliasing doesn't apply. The aliasing rule says "An object shall have its *stored value accessed*...", and there is no stored object there. Once the client code acquires the pointer and writes something into it, that becomes the stored value, and its type becomes the effective type. – rici Apr 12 '20 at 20:58
  • @rici Thank you, that would indeed explain a lot! – Julius Apr 12 '20 at 21:04
  • A lot of the internals of standard libs rely on what the C standard would list as poorly-defined behavior, or non-standard extensions. For example the various optimizations of functions like memcpy work on for example 32 bit chunks by reading the data as aligned `uint32_t`, which would be a clear strict aliasing violation if done by a normal C application. – Lundin Apr 14 '20 at 06:51

3 Answers3

4

According to the standard, these things never violate strict aliasing:

  • Casting a pointer.
  • Doing pointer arithmetic.
  • Writing into malloc'd space.

The thing you are not allowed to do in malloc'd space is read some memory as a different type than it was written as (except for the list of allowed aliasing types of course).

The text of the rule is in C11 6.5/7:

An object shall have its stored value accessed only by [...]

and the text in 6.5/6 explains that if we are in malloc'd space then the write imprints the type of the write onto the destination (and therefore there cannot be a type mismatch).

The code you've posted so far never does the forbidden thing so there is no apparent problem. There would only be a problem if someone used your allocator and then read the memory without writing it .

Footnote 1: 6.5/6 apparently is defective according to the committee response to DR236 but never fixed so who knows where that leaves us.

Footnote 2: as Eric points out the standard doesn't apply to implementation internals, but consider my comments in the context of some user-written allocator as in the other question you linked to.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • The committee seem to disagree with your opinion about writes to malloc'd space. See **Example 1** and the **Committee Response** in [DR236](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm) – Language Lawyer Apr 12 '20 at 23:35
  • @LanguageLawyer the text in 6.5/6 is quite clear that Example 1 is well-defined, so the Committee Response must be interpreted as saying that 6.5/6 is defective . In the [document log](http://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm) there are several proposals to change 6.5/6. However in 20 years this defect has not been corrected so where does that leave us? – M.M Apr 12 '20 at 23:54
  • @M.M: Although 6.5p6-7 make no sense precisely as written, all that would be needed to fix it would be to say that it only applies in cases where (1) multiple lvalues are used to access the same storage in conflicting fashion within some context [where the context may be drawn broadly or narrowly at the compiler's leisure], and (2) the lvalues are not visibly freshly derived within that context from a pointers or lvalues that identify the same object or members of the same array. I doubt the authors of the Standard imagined that compiler writers would use the Standard as an excuse... – supercat Apr 13 '20 at 22:31
  • ...to ignore visible pointer derivation. Note that if one fixes 6.5p7 in this way, one would eliminate the need for 6.5p6 as well as the "character type" exception, but would allow many useful optimizations which are not possible under the clang/gcc interpretation of the Standard. – supercat Apr 13 '20 at 22:33
  • The "strict aliasing"/effective type rules were always shaky and unclear, but I believe there is some sort of consensus over how to treat malloc memory at least. The data returned from malloc is to be regarded as having no declared type, until the point when the first lvalue write access is done - from there on that accessed memory has the effective type which was used for that first access. Otherwise there's no making sense of the effective type rules. – Lundin Apr 14 '20 at 06:55
  • @Lundin what you describe is what the standard says, but the Committee Response to DR236 contradicts that – M.M Apr 14 '20 at 07:17
  • @M.M It's a very old DR. My take is that the committee was as confused back in 2000 as they were when they wrote this part of the C99 standard. Compilers have tried to make sense of the rules long after, with gcc (and clang?) taking the language lawyer route and abusing strict aliasing UB optimization opportunities. But to my knowledge, every other compiler on the market does not use these UB exploits. – Lundin Apr 14 '20 at 07:57
  • @Lundin In the first code example in the DR though -- if that basic case cannot be optimized by reordering, we have to ask what point the strict aliasing rule has at all – M.M Apr 14 '20 at 08:43
  • @Lundin: The C89 rules would have been workable, without any "effective type" nonsense, if it were recognized that they're only meant to apply in cases where (1) conflicting accesses to storage would occur within a compiler's field of view, and (2) nothing within that field of view would suggest any relationship between the lvalues used to access the storage. I think the authors of the Standard would be astonished at the idea that anyone claiming to produce a quality compiler would feel no obligation to limit their application of the rule to the situations described above. – supercat Apr 20 '20 at 22:09
3

The source code of malloc is not required to conform to the C standard in the way that normal source code is. It is part of the C implementation.

The people who work on malloc, the compiler, and other parts of the C implementation are responsible for ensuring they work together. That can include the compiler treating malloc specially and malloc using behaviors that are guaranteed to it by the C compiler but not by the C standard.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
-2

The C Standard deliberately avoids requiring that all implementations be suitable for all purposes. It is instead designed to allow implementations intended for various purposes to, as a form of "conforming language extension", process constructs meaningfully in ways that would be useful for those purposes even when the Standard imposes no requirements. Thus, the Standard allows implementations intended for tasks requiring manual memory management to support "popular extensions" that facilitate such tasks, even though it deliberately avoids requiring such support from implementations that aren't intended for such tasks.

Many kinds of memory allocators would be impractical to implement on implementations whose semantics are limited to those mandated by the Standard. Implementations that uphold the Spirit of C principle described by the Committee as "Don't prevent the programmer from doing what needs to be done", however, and are designed and configured to be suitable for building such allocators, however, will recognize indications that storage will be used as more than one type. The exact range of situations where compilers recognize such indications was left as a "quality of implementation" issue outside the Standard's jurisdiction. From a practical standpoint, the authors of clang and gcc have opted to behave in minimal-allowable-quality fashion except when using -fno-strict-aliasing, but all that means is that programmers who want to do anything "interesting" with those compilers must use that option. There is no evidence whatsoever that the authors of the Standard intended that programmers should be expected to jump through hoops to accommodate the limitations of poor-quality implementations; instead, they expected that the market would be better placed than the Committee to judge how compilers should most usefully behave to accomplish various tasks.

supercat
  • 77,689
  • 9
  • 166
  • 211