What is the purpose of glibc's atomic_forced_read function?

Question

I am trying to understand the purpose of the definition of atomic_forced_read which shows up frequently in the GNU libc implementation of malloc.c.

I am not great when it comes to inline assembly, but it looks like this returns the exact same value, with the same type as the input value. what am I missing here?

Atomic forced read definition in atomic.h

523 #ifndef atomic_forced_read
524 # define atomic_forced_read(x) \
525   ({ __typeof (x) __x; __asm ("" : "=r" (__x) : "0" (x)); __x; })
526 #endif

Link to atomic.h

https://code.woboq.org/userspace/glibc/include/atomic.h.html

Possible duplicate of [Does free(ptr) where ptr is NULL corrupt memory?](https://stackoverflow.com/questions/1938735/does-freeptr-where-ptr-is-null-corrupt-memory) — dandan78, Sep 24 '19 at 14:30
**What** is your question?! What does the `atomic_forced_read` have to do with null pointers?! — Antti Haapala -- Слава Україні, Sep 24 '19 at 14:31
Changed the title. The scope of my question changed as I was researching this, but I had forgot to change the title. I have convinced myself that there is nothing wrong with freeing and null pointer, at least in gnulibc, but I am curious about this definition. — mreff555, Sep 24 '19 at 14:36
It's hard to say without more context, but I suspect this is a fallback definition used only when the previous several hundred lines of `#ifdef`s don't come up with an actual assembly instruction to put in there. It may also serve as an [optimization fence](https://stackoverflow.com/questions/16975935/whats-the-purpose-of-compiler-barrier). — zwol, Sep 24 '19 at 14:40
Incidentally, the C standard has always required `free(0)` to be safe and to do nothing. There _have_ been (noncompliant and/or pre-1989) C libraries that would crash, but you're almost surely not going to encounter one nowadays unless you're doing retrocomputing. (See dandan78's link for more detail.) — zwol, Sep 24 '19 at 14:43
Could you please edit your question and add a link to the full text of the `atomic.h` that you are quoting from? That may give us enough information to write an actual answer. — zwol, Sep 24 '19 at 14:44
Added. As for the safety of free, I assumed as much, which is why I looked into that. The colleague who warned me against deleting null pointers has 40+ years of experience and no doubt, dealt with some of those older libraries. Looks safe enough now though. — mreff555, Sep 24 '19 at 14:54
It seems to me to be similar to ACCESS_ONCE in the Linux kernel. It just forces a read into a register, but I can't tell how that compares against a volatile read. If the data is already in a register would it force a reload? Would it prevent the compiler turning accesses of the target into accesses of the source data? — ninjalj, Sep 24 '19 at 15:49
@ninjalj: it produces the output in the same register as the input. I'm not sure of the purpose; the compiler could still just copy `x` to another register ahead of this statement if it still wanted to keep the value in a register. — Peter Cordes, Sep 24 '19 at 21:18
@ninjalj: Oh, and that output is the return value of the statement-expression. Going through asm makes the compiler forget anything it knows about the value, e.g. being non-negative. Also blocks CSE of the output and input if they're both used for something else. It's the kind of thing you'd use when writing a microbenchmark to stop the optimizer from hoisting work outside of a loop. I'd have to look at a use-case in malloc.c to guess why it's useful. — Peter Cordes, Sep 24 '19 at 21:22
@PeterCordes: in malloc it's used to prevent a race condition between checking if a hook function pointer is NULL and calling it. Its purpose is to access shared variables, given you know the access will be atomic (proper size and alignment for supported platforms), and prevent the compiler from eliding/duplicating your loads. I just don't know what guarantees it really gives, esp. compared to a volatile read. — ninjalj, Sep 25 '19 at 12:38
@ninjalj: well that sounds like massive overcomplication; just load the shared value into a local and then check that for non-NULL before calling through it. No special magic needed to prevent the NULL check from optimizing away, unless the pointer is unconditionally dereferenced earlier or later. (Then compilers will optimize it away because deref implies non-NULL because of UB). The OP still hasn't included a use case in the question, and I haven't bothered to go looking for myself. — Peter Cordes, Sep 25 '19 at 13:09
@PeterCordes: the problem is not that the compiler may optimize away the NULL pointer check, but a TOCTOU where the pointer may be set to NULL after the check and before the call. https://sourceware.org/bugzilla/show_bug.cgi?id=9957 indicates that loading into a local is not enough, in the S390 assembly in that bug there are two loads of the hook variable from memory. — ninjalj, Sep 25 '19 at 16:25
@ninjalj: Ahhh, I see now. Without `volatile`, the compiler thinks it can safely reload the global instead of spilling/reloading a local. Easy to forget just how much the optimizer can do sometimes... Or maybe I should say I'm not used to dealing with UB-containing code that does shared access to non-atomic non-volatile objects. — Peter Cordes, Sep 25 '19 at 22:23

Maxim Egorushkin · Accepted Answer · 2019-09-25T15:05:44.907

One usage is of atomic_forced_read:

#if HAVE_MALLOC_INIT_HOOK
  void (*hook) (void) = atomic_forced_read(__malloc_initialize_hook);
  if (hook != NULL)
    (*hook)();
#endif

It appears that __malloc_initialize_hook can be changed from another thread, so that if __malloc_initialize_hook is loaded once more from memory after the NULL check its value may have changed back to NULL.

atomic_forced_read makes sure that __malloc_initialize_hook is loaded into a register due to =routput constraint, so that __malloc_initialize_hook isn't reloaded from memory after the NULL check. That empty asm breaks compiler dependency of hook on __malloc_initialize_hook, since hook is now initialized with __x stored in a register and not __malloc_initialize_hook. After hook has been initialized with __x, the latter is gone and cannot be possibly reloaded.

In C11 mode that __malloc_initialize_hook can be atomic_uintptr_t and atomic_load_explicit(&__malloc_initialize_hook, memory_order_relaxed) can be used instead of atomic_forced_read to load __malloc_initialize_hook from memory exactly once.

Yup, it "launders" the value of `__malloc_initialize_hook` so the optimizer can't decide to load the global twice. As ninjali commented on the question, https://sourceware.org/bugzilla/show_bug.cgi?id=9957 shows that did happen for S390 even when using a local (but without `asm` or `volatile`). — Peter Cordes, Sep 25 '19 at 22:29
@PeterCordes You mentioned "launders", however, `std::launder` is a rather peculiar and limited tool, see [WG21 P0532R0 On launder()](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0532r0.pdf). — Maxim Egorushkin, Oct 08 '19 at 16:58

What is the purpose of glibc's atomic_forced_read function?

1 Answers1

Linked