5

So looking around the internet, I couldn't find consistent and helpful information about this. So here's the issue:

Why are local static variables in C said to be thread-unsafe? I mean, static local variables are stored in the data segment, which is shared by all threads, but isn't internal linkage supposed to stop threads from stepping in each other's static variables?

This forum post seems to suggest that threads do in fact step in each other's data segment occasionally, but wouldn't such behavior clearly violate all c standards since the 90'? If such behavor were to be expected, wouldn't use of the data segment (i.e. all variables with static storage duration, including global variables) have been made deprecated long ago in the successive c standards?

I really don't get this, since everyone seems to have something against local static variables, but people can't seem to agree on why, and researching some of the argument shows them to be ill-conceived.

I, for one, think local static variables are a very good way to communicate information between function calls, that can really improve readability and limit scope (compared to, say, passing the information as arguments forth and writing it back on each function call).

As far as I can see, there are completely legitimate uses of local static variables. But maybe I am missing something? I would really like to know if that were the case.

[EDIT]: The answers here were pretty helpful. Thanks to everyone for the insight.

FaresGargouri
  • 111
  • 10

5 Answers5

6

but isn't internal linkage supposed to stop threads from stepping in each other's static variables?

No, linkage has nothing to do with thread safety. It merely restricts functions from accessing variables declared in other scopes, which is a different and unrelated matter.

Lets assume you have a function like this:

int do_stuff (void)
{
  static int x=0;
  ...
  return x++;
}

and then this function is called by multiple threads, thread 1 and thread 2. The thread callback functions cannot access x directly, because it has local scope. However, they can call do_stuff() and they can do so simultaneously. And then you will get scenarios like this:

  • Thread 1 has executed do_stuff until the point return 0 to caller.
  • Thread 1 is about to write value 1 to x, but before it does..:
  • Context switch, thread 2 steps in and executes do_stuff.
  • Thread 2 reads x, it is still 0, so it returns 0 to the caller and then increases x by 1.
  • x is now 1.
  • Thread 1 gets focus again. It was about to store 1 to x so that's what it does.
  • Now x is still 1, although if the program had behaved correctly, it should have been 2.

This gets even worse when the access to x is done in multiple instructions, so that one thread reads "half of x" and then gets interrupted.

This is a "race condition" and the solution here is to protect x with a mutex or similar protection mechanism. Doing so will make the function thread-safe. Alternatively, do_stuff can be rewritten to not use any static storage variables or similar resources - it would then be re-entrant.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • yes, what you say make sense. I never thought of using the same code in different threads, which is probably why I didn't think of this. Thanks for the insight. – FaresGargouri Jun 11 '18 at 11:46
  • 1
    @FaresGargouri The very same issue often exists in library code, such as implementations of the C standard library. This is why we must know if a function is thread-safe before calling it from multiple threads. If it isn't documented as thread-safe, it is probably not. – Lundin Jun 11 '18 at 11:48
  • Besides the improper final value of `x`, there's also the issue that two interleaved invocations of `do_stuff` might return identical values (which would obviously be a big problem if its name were not `do_stuff` but, say, `get_unique_id`). – Steve Summit Jun 11 '18 at 13:34
  • @SteveSummit Such attitude is the root of countless bugs. Lots of bad programmers get away with being bad because a specific compiler happened to generate atomic instructions out of their C code. There is no guarantee that `return ret` is atomic either, nor is there a guarantee that `a = b` is atomic. We get stupid remarks like "I only write from one place so there is no need for mutex" all the time. People with such attitude have not grasped how C code translates to machine code. Keeping them in the dark isn't doing anyone a favour. – Lundin Jun 11 '18 at 13:57
2

isn't internal linkage supposed to stop threads from stepping in each other's static variables?

Linkage has nothing to do with concurrency: internal linkage stops translation units, not threads, from seeing each other's variables.

I, for one, think local static variables are a very good way to communicate information between function calls, that can really improve readability and limit scope

Communicating information between calls through static variables is not too different from communicating information through globals, for the same reasons: when you do that, your function becomes non-reentrant, severely limiting its uses.

The root cause of the problem is that read/write use of variables with static linkage transforms a function form stateless to stateful. Without static variables any state controlled by the function must be passed to it from the outside; static variables, on the other hand, let functions keep "hidden" state.

To see the consequences of keeping a hidden state, consider strtok function: you cannot use it concurrently, because multiple threads would step on each other's state. Moreover, you cannot use it even from a single thread if you wish to parse each token from a string that is currently being parsed, because your second-level invocation would interfere with your own top-level invocation.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Regarding the first point: so it is possible for threads to just randomly access variables that are not in their scope when the variables are stored in the data segment? As for the second one: good point, but what about when talking about functions that are just meant to modularize blocks of code that are otherwise too large and cumbersome, not for code reuse as a library? – FaresGargouri Jun 11 '18 at 11:02
  • 2
    @FaresGargouri As far as C standard is concerned, threads do not exist, let alone having scope. If two threads can call a function that modifies a static variable, then these two threads can randomly step on each other's data. Thread-local storage is implemented by a thread library. As far as modularizing blocks go, you can do it without breaking reusability by providing opaque handles to data that would otherwise end up in static variables, and have the users pass these handles to each function in your module. – Sergey Kalinichenko Jun 11 '18 at 11:15
  • using opaque handles sounds like an interesting suggestion. Can you please link some literature regarding how to implement this? – FaresGargouri Jun 11 '18 at 11:52
  • 1
    @FaresGargouri Here is a good place to start: [What is an opaque pointer in C?](https://stackoverflow.com/q/7553750/335858) – Sergey Kalinichenko Jun 11 '18 at 12:22
  • @Lundin Nice to know, thanks. I looked for keyword "thread" in C99 reference, and didn't find any. I guess 12 years really made a difference ;-) – Sergey Kalinichenko Jun 11 '18 at 14:08
  • Yeah well, some 11 years too late. It doesn't really make much sense to implement C11 threads now, since pthreads are already the de facto standard since somewhere in the late 1990s. – Lundin Jun 11 '18 at 14:15
1

From my point of view, the base is wrong, or at least, it is as unsafe as any other bad design.

A bad software practice (or thread unsafe) may be sharing resources without criteria or kind of protection (there are different and great ways for communication between threads, such as queues, mailboxs, etc, or semaphores and mutexs if the resource has to be shared), but this is developers' fault, because they are not using the proper mechanisms.

Actually I cannot see your point, a static local variable, whose scope is well defined (and even better, for embedded applications is useful to avoid memory overflows) and cannot be accessed out of that, so I guess there is no relation between unsafe code and static local variables (or at least, not in a general meaning).

If you are talking about a static local variable which can be written/read/.. from two different tasks without protection (through a callback or whatever), that is a horrible design (and again, developers' fault), but no because the static local variables are (generally) unsafe.

Jose
  • 3,306
  • 1
  • 17
  • 22
  • Ok, so my takeaway from this is that no, static local variables are not inherently bad bad, but require shared resource management(such as by, say, semaphores) if the code is used by different threads. – FaresGargouri Jun 11 '18 at 12:01
0

The behaviour of simultaneously reading from and writing to any non-atomic object is undefined in C.

A static variable makes the possibility of this happening substantially greater than an automatic or dynamic variable. And that is the crux of the problem.

So if you don't control your threading (using mutual exclusion units for example), you could put your program into an undefined state.

A sort of half-way-house; thread local storage is available with some C compilers, but it has not yet been incorporated into the C standard (cf. thread_local of C++11). See, for example, https://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Thread-Local.html

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
0

isn't internal linkage supposed to stop threads from stepping in each other's static variables?

Your question is tagged . There are no threads in the C programming language. If your program creates any new threads, it does so by calling in to some library at run-time. The C tool chain does not know what threads are, it has no way of knowing that the library routines you call create threads, and it has no way of knowing if you consider any particular static variable to be "owned" by one thread or another thread.

Every thread in your program runs in the same virtual address space as every other thread. Every thread potentially has access to all of the same variables that can be accessed by any other thread. If a variable in the program actually is used by more than one thread, it is the programmer's responsibility (not the tool chain's responsibility) to ensure that the threads use it in a safe way.

everyone seems to have something against local static variables,

Software developers who work in teams to develop large, long-lived software systems (think, tens of years and hundreds of thousands to tens of millions of lines of code) have some very well understood reasons to avoid using static variables. Not everyone works on systems like that, but you will meet some folk here who do.

people can't seem to agree on why

Not all software systems need to be maintained and upgraded for tens of years, and not all have tens of millions of lines of code. It's a big world. There are people out there writing code for many different reasons. They do not all have the same needs.

and researching some of the argument shows them to be ill-conceived

There are people out there writing code for many different reasons... What seems "ill-conceived" to you might be something that some other group of developers have thought long and hard about. Perhaps you do not fully understand their needs.

As far as I can see, there are completely legitimate uses of local static variables

Yes. That is why they exist. The C programming language, like many other programming languages, is a general tool that can be used in many different ways.

Solomon Slow
  • 25,130
  • 5
  • 37
  • 57