9

In JLS, §17.4.5. Happens-before Order, it says that

A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.

According to discussion in Does a correctly synchronized program still allow data race?(Part I),we get following conclusion:

A program can be correctly synchronized and have data races.

The combination of two conclusions means that it must exists such an example:

All sequentially consistent executions of a program are data race free, but the normal executions (executions other than sequentially consistent executions) of such a program contain data race.

After heavy consideration, I still can not find such a code sample. So how about you?

Community
  • 1
  • 1
newman
  • 499
  • 3
  • 10
  • 1
    Java Concurrency in Practice (book) has all these code for you. see http://jcip.net/ – su- Aug 19 '12 at 04:09
  • 2
    @newman +1 for properly researching your question! – obataku Aug 19 '12 at 04:09
  • 1
    See also [*What's sequentially consistent executions are free of data races?*](http://stackoverflow.com/q/12018369/230513) and this [series of related questions](http://stackoverflow.com/users/1423978/newman). – trashgod Aug 19 '12 at 04:25
  • @su-, would you please point out which code in that book can satisfy my request. – newman Aug 19 '12 at 10:14
  • @newman I did not reach the conclusion that "A program can be correctly synchronized and have data races.". I only showed you that a program could have all its executions sequentially consistent and have data races. – assylias Aug 20 '12 at 08:47
  • String is really interesting example, class is thread safe and contains a data race in the hashCode(), Even though class doesn't have any synchronization and contains mutable data it is still thread safe. This data race is benign because String object doesn't change whoever wins the race. – Mark Aug 20 '12 at 09:35

1 Answers1

9

It is not true that "A program can be correctly synchronized and have data races." The example by assylias in that discussion is not correctly synchronized. It is correct from the higher-level, functional standpoint—the data race it contains does not manifest itself as a bug. It is a so-called "benign" data race, but that is irrelevant when discussing the JLS definitions.

A program whose sequentially consistent execution don't contain data races is guaranteed not to contain data races in any execution, sequentially consistent or not. As the JLS says,

This is an extremely strong guarantee for programmers. Programmers do not need to reason about reorderings to determine that their code contains data races. Therefore they do not need to reason about reorderings when determining whether their code is correctly synchronized. Once the determination that the code is correctly synchronized is made, the programmer does not need to worry that reorderings will affect his or her code.

So please note that the definition of a correctly synchronized program is narrowed down to only sequentially consistent executions as a courtesy to the programmer, giving him the strong guarantee that sequentially consistent executions are the only ones he or she needs to reason about and all other executions will automatically have the same guarantee.

UPDATE

It is easy to get lost in the terminology used by the JMM and subtle misinterpretations result in deep misunderstandings later on. Therefore take these to heart:

  • an execution is simply a set of inter-thread actions. There is no a priori order to it. In particular, there is no time order to it.

This is a counterintuitive definition, so we must be careful about it: each time we say execution, we must be sure to imagine a bag of actions, never a string of them. Whenever we define a partial order, we should imagine several bags lined up.

  • a program contains instructions to perform actions. Each such instruction can be executed zero or more times, contributing zero or more distinct actions to the particular execution;
  • an execution may or may not have an execution order, which is a total order over all actions;
  • a sequentially consistent execution is what you would get if all your shared variables were volatile. This kind of execution always has a definite execution order;
  • a sequentially inconsistent execution is your real-world execution of a program: non-volatile variables are involved and the compiler reorders reads and writes, there are caches, thread-local stores, etc.
  • the synchronization order is the total order over all synchronization actions done by an execution. With respect to the execution itself it is still a partial order because not all actions are synchronization actions; most notably, reads and writes of non-volatile variables. Every execution, be it sequentially consistent or not, has a definite synchronization order;
  • likewise, the happens-before order is defined for a specific execution of a program and is derived as a transitive closure of the synchronization order with the program order.

It is interesting to note that if all your shared vars were volatile, then the synchronization order would become a total order and as such would meet the definition of the execution order. This way we come from a different angle to the conclusion that all executions of such a program would be sequentially consistent.

I have dug deep to get to the bottom of the JLS bug in the definition of a data race:

"When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race."

First of all, it is not the program that contains data races, but a program execution. If we refer back to the original paper defining the Java Memory Model, we'll see this corrected:

"Two accesses x and y form a data race in an execution of a program if they are from different threads, they conflict, and they are not ordered by happens-before."

However, this still leaves us with actions on volatile vars being defined as data races. Consider the following happens-before graph:

Thread W        w1 ----> w2
                |
                 \
Thread R     r0 ----> r1

r1 observerd the write w1. It was preceded by another read, r0, and the write was followed by another one, w2. Now notice there is no path between r0 and either w1 or w2; likewise between r1 and w2. All these are examples of a data race by the definition.

Digging even deeper, however, I have found this post on the memoryModel mailing list. It says "a data race should be defined as conflicting actions on non-volatile variables that are not ordered by happens-before". Only with that addition would the loophole be closed, but this has still not entered the official JLS release.

Community
  • 1
  • 1
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • If "A program can be correctly synchronized and have data races." is not true, then we get "If a program is correctly synchronized, then it is free of data races". But from JLS, we can not get such a conclusion. In other words, if the second conclusion is true, then JLS will show us, but JLS does not. – newman Aug 19 '12 at 09:36
  • "When a program contains two conflicting accesses (§17.4.1) that are not ordered by a *happens-before* relationship, it is said to contain a data race." Note that a *happens-before* is not defined for a program, but for one of its **executions**. This is just a weak wording by the JLS. That sentence in reality refers to a **program execution**. Next note that you can rearrange execution order any way you like without disrupting the *happens-before* relationships. Therefore you cannot introduce a data race by reordering a sequentially consistent execution into an inconsistent one. – Marko Topolnik Aug 19 '12 at 10:19
  • I have two points about your reply: 1. you still do not answer my question why JLS does not say **"If a program is correctly synchronized, then it is free of data races"**. 2. rearranging execution order **any way** may disrupt the happens-before relationships, if you rearrange the execution order of actions which have happens-before relationship. – newman Aug 19 '12 at 15:37
  • 1. This is the first time you ask it so small wonder I don't answer it. 2. happens-before is not defined in terms of the execution order. – Marko Topolnik Aug 19 '12 at 17:16
  • Re 1. please note that a data race is not defined for a program but for a specific execution of that program. – Marko Topolnik Aug 19 '12 at 17:24
  • I would ask you the following: do you realize precisely what is the distinction of a sequentially consistent and inconsistent program execution? Do you realize that the difference is not under the control of the program (thereby you as the programmer), but of the JVM implementation? You cannot ask for a "program that demonstrates sequentially inconsistent execution", or for a "program whose sequentially inconsistent execution contains a data race". Those questions are **meaningless**. – Marko Topolnik Aug 20 '12 at 05:47
  • @MarkoTopolnik re *"a data race is not defined for a program"*, quoting JLS: *"When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race."* – assylias Aug 20 '12 at 08:56
  • @assylias Yes, I have already discussed this point with OP. This is an instance of unfortunately weak wording in the JLS, but since elsewhere *happens-before* is defined in terms of *synchronization order*, and this is again defined for a specific execution only, it must be that it is in fact the *execution of a program* that is being referred to. – Marko Topolnik Aug 20 '12 at 08:58
  • I understand it as a shortcut: a program is data race free (DRF) <=> all its executions are DRF. – assylias Aug 20 '12 at 09:00
  • 1
    @assylias That's what we usually assume under the notion, but OP goes to the JLS and reads "A program is correctly synchronized if and only if all sequentially **consistent** executions are free of data races." and starts fretting "but what about the sequentially **inconsistent** executions?" NB I think it is indeed a good approach by OP to have terms 100% clear before allowing oneself to use shortcuts. – Marko Topolnik Aug 20 '12 at 09:09
  • @newman I have made some more research into this issue with the definition of a data race. See the updated answer for details. – Marko Topolnik Aug 20 '12 at 21:43
  • @assylias Fixing that definition of data race proved to be trickier than just replacing "program" with "execution": even after that, any program involving volatile vars would be defined as containing a data race. See my update in this answer for interesting details. – Marko Topolnik Aug 20 '12 at 21:45
  • @Marko Topolnik, thanks a lot for all your job. but I still have question bother you. You said that "happens-before is not defined in terms of the execution order.". Now lets take a look at the definition of happens-before: ... •If an action x synchronizes-with a following action y, then we also have hb(x, y)....Now continue our look at the definition of synchronizes-with: ... •An unlock action on monitor m synchronizes-with all subsequent lock actions on m (where "subsequent" is defined according to the synchronization order).... Now continue our look at the definition of synchronization – newman Aug 21 '12 at 14:35
  • order: A synchronization order is a total order over all of the synchronization actions of an execution. Now we can get a conclusion: happens-before is at least defined partly in terms of the execution order. That's why I said that "rearranging execution order any way may disrupt the happens-before relationships, if you rearrange the execution order of actions which have happens-before relationship.", much precisely is it should be: "...if you rearrange the execution order of actions which have happens-before relationship which is defined in synchronization order." – newman Aug 21 '12 at 14:36
  • There, I added clarification of key terms. Read it and see if it clears up your thoughts. – Marko Topolnik Aug 21 '12 at 14:57
  • @MarkoTopolnik This is getting very precise indeed - sorry I can't +2 you! However, I think we should stick to the common sense w.r.t. the semantics of volatile (which is not backed by the JLS) for most SO questions. This post is admittedly different as the OP is specifically asking for terminology. – assylias Aug 21 '12 at 16:05
  • 1
    @assylias Yes, especially since it is an acknowledged JLS bug. Also, in any context other than this level of precision, "sequentially consistent" is synonymous with "giving the appearance of sequential consistency". Since the **actual** sequential consistency is never attained, this creates a kind of vacancy for a useful meaning of the term. – Marko Topolnik Aug 21 '12 at 16:15
  • @assylias, would you please tell me what does "SO" mean in you reply "... for most SO questions."? Thanks a lot. – newman Aug 22 '12 at 16:00
  • @MarkoTopolnik. I'm not sure whether I capture the meaning of your last sample in your answer. What you mean is that actions on volatile vars should never contain data race, but according to current definition of data race, actions on volatile vars still contain data race. Is my understanding correct? Thanks a lot. – newman Aug 23 '12 at 06:57
  • The meaning is that the definition in the specification should be fixed that way. This is what was discussed at the memory-model mailing list back in 2005. – Marko Topolnik Aug 23 '12 at 07:24
  • @MarkoTopolnik, I have just read Bart's original mail. I can understand every sentence he said, but I still can not understand why he said that if actions on volatile vars contain data race then guarantee he mentioned can not be applied. So could you please explain it in more detail in your answer or just at here? Thanks a lot. – newman Aug 23 '12 at 15:17
  • The point is, the actions on volatile vars are, by the very nature of the volatile, **never in a data race**. The definition must reflect that. – Marko Topolnik Aug 23 '12 at 15:23
  • @MarkoTopolnik, thanks for your reply. But I still can not understand why he said that if actions on volatile vars contain data race then guarantee(... if and only if... in JLS) he mentioned can not be applied. So would you please answer this question in more detail? – newman Aug 23 '12 at 15:56
  • I'm afraid it is getting real hard for me to make this even clearer and more explicit. A program that only shares volatile vars between threads cannot possibly contain a data race, contrary to the definition of the data race in the JLS. – Marko Topolnik Aug 23 '12 at 17:44