How does an extern variable work in a shared library

Question

Say I wrote a simple dynamic library like this:

lib.h

#pragma once

extern int x;
extern int p(void);

lib.c

#include <lib.h>
#include <stdio.h>

x = 0;
int p(void) {
    printf("lib: %d\n", x++);
    return 0;
}

a.c

#include <lib.h>
#include <stdio.h>

int main(void) {
    for (; !p(); x--) printf("a.c: %d\n", x);
    return 0;
}

b.c

#include <lib.h>
#include <stdio.h>

int main(void) {
    for (; !p(); x = 0) printf("b.c: %d\n", x);
    return 0;
}

What would a and b print? I can think of a couple things that that may happen:

Linker error: x declared extern but never defined.
Each process gets it's own x, including lib. (b.c is always 0, a.c counts down, lib counts up)
Each process gets it's own x to share with lib. (a.c and b.c are always 1, lib is always 0)
All processes share the same x, including lib. (a.c, b.c and lib return random values)
All processes share the same x, including lib, until someone other than lib writes to it, then that process gets it's own version of x, not shared with lib (Read this online somewhere). (lib always increments, b.c always prints 0, a.c counts down)

What typically happens? Are there any inconsistencies between compilers/platforms we should know about? Can we force one behaviour (I am thinking __declspec(dllexport), compiler flags, etc.)?

I think the linker error part isn't relevant since you'll have to do some work to produce an executable result ignoring undefined symbols. To the cases at runtime, shared libraries are separately linked at runtime into each process and writable pages are default copy on write. Each of a and b have their own copy of x under normal circumstances (big asterisk), there is no separate lib process. On windows, dlls may contain data that is shared between processes if explicitly requested, but it's much more common and flexible to explicitly create and use shared memory segments to share memory. — Art Yerkes, Oct 14 '16 at 18:06
@ArtYerkes AFAIK, the linker will error if the linker does not know `lib` exports `extern int x`, which would probably mean that, using that linker, libraries cannot export variables. As for the rest of your comment, if I understand it correctly: writing to e.g. `extern int x` will not be visible for other processes, but _will_ be for `lib`, only when called with that process (much like case 3, except for that all processes initially share the same `x`)? Is it that, or do you (with "copy on write") mean that when you write to a variable it will create a new variable that `lib` does not use? — yyny, Oct 14 '16 at 18:50
See also http://stackoverflow.com/questions/19373061/what-happens-to-global-and-static-variables-in-a-shared-library-when-it-is-dynam — nos, Oct 14 '16 at 23:09

Art Yerkes · Accepted Answer · 2016-10-14T23:16:27.287

There are several parts to this question:

What would a and b print? I can think of a couple things that that may happen:

Linker error: x declared extern but never defined.

Nothing would be printed since a and b probably haven't been built into executables yet. Of course you need to link lib.so, lib.a or an import library lib.lib to expose the executable to a linkable definition of x, otherwise nothing else works (mostly, it can be more complicated than that if try hard).

Each process gets it's own x, including lib. (b.c is always 0, a.c counts down, lib counts up)

lib isn't a process in your scenario, it's a shared library. The shared library is separately loaded and linked in each process space where something references it in a way understood by the dynamic loader (ld-linux.so, ntdll.dll on windows). Each process observes a copy of the loaded library in its address space, and the library itself sees the same copy, so running a should print 0 followed by 1 forever. p() is run and tested, x is printed, x is decremented back to 0. b will also print 0 followed by 1 forever. p() is run and tested, x is printed, x is set to 0. Note that p() prints x++ so the increment takes place after the value is captured for the argument to printf. The x variables to which the programs containing a and b refer are specific to each run of a or b. This is frequently accomplished at the OS level by mapping pages of the actual loadable library from disk into memory and setting them "copy-on-write", where attempted changes by the host process cause the OS to allocate a new page and copy the old contents on first. The result is that unmodified parts of the loaded library take up less actual memory.

Each process gets it's own x to share with lib. (a.c and b.c are always 1, lib is always 0)

Lib isn't a separate process. Executing p() in a sees the same x as the one linked by a.

All processes share the same x, including lib. (a.c, b.c and lib return random values)

Normally not the case (also see below).

All processes share the same x, including lib, until someone other than lib writes to it, then that process gets it's own version of x, not shared with lib (Read this online somewhere). (lib always increments, b.c always prints 0, a.c counts down)

Some old runtime systems that don't support separate address spaces do work this way, notably amigados. It's quite unlikely you'll run into one.

What typically happens? Are there any inconsistencies between compilers/platforms we should know about? Can we force one behaviour (I am thinking __declspec(dllexport), compiler flags, etc.)?

In the vast majority of cases, each process shares extern variables with the one instance of the given library loaded in that process. Unless you take specific action, that's expected.

In the comments, there were a few other questions:

Can windows dlls (or others) export non-function data.

Yes. Use the DATA qualifier in the .def file when building the import lib. For others it's not different from exporting functions. You'll however receive a pointer to the target variable rather than be bound to the space occupied.

Asterisk, see below?

On windows, sections have a SHARED attribute that causes the loader to allocate the same page in every process that uses the DLL. It's not the default and you have to jump through hoops to and use platform specific pragmas to do it. There are a lot of reasons not to use this.

Most of the time, when a dll wants to share state among copies of itself loaded in many processes, it uses the shared memory API of the host system (CreateFileMapping or mmap usually). This allows flexibility (for example, all a processes could share one version of x, separate from all b processes with another copy of x). Note that using SHARED could easily mean that running a could crash b, and having another long running user c loaded could keep either a or b from starting up again until a reboot.

Thanks, this helps. However, while I know `lib` is not (technically) a process, and while I never claimed it either, I do think comparing shared libraries to processes would make explaining them much easier. Sharing memory from a shared library is like sharing memory between processes, not something the C standard defines, but something commonly implemented by other large standards (POSIX, Windows, etc.). In fact, AFAIK, sharing memory between processes works the same as within a shared library. — yyny, Oct 15 '16 at 00:11
One more question: Returning pointers to static variables in a shared library always works (modifying the variable on write), right? — yyny, Oct 15 '16 at 00:13
Yes, everyone observes the same addresses in a process space so you can return pointers to statics and globals in your shared library and the user and your library code will observe the same value (modified on write, so other processes don't observe the same change). Sharing memory between processes is different in a fundamental way, because the shared library has no life of its own. Everything it does, it does on behalf of the process it lives in. Shared memory between processes is a choice two processes make, whereas a shared library can't opt out of the process address space. — Art Yerkes, Oct 15 '16 at 04:23

How does an extern variable work in a shared library

lib.h

lib.c

a.c

b.c

1 Answers1