6

There are two conclusions from JLS:

  • C1: If a program is free of data races, then all executions of the program will appear to be sequentially consistent: data-race-free => sequentially consistent
  • C2: If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent: correctly synchronized => sequentially consistent

If the converse of C1 is true, then we can conclude that:

  • C3: If a program is correctly synchronized, then it is free of data races: correctly synchronized => data-race-free

But unfortunately, there is no such statement in the JLS, so I get to the fourth conclusion:

  • C4: A program can be correctly synchronized and have data races.

But I am not satisfied with this approach and want to get a proof that this conclusion is true (or false), even in an informal way or in sample way.

First of all, I think a code segment that shows a sequentially consistent execution of a multi-threaded program that contains a data race is helpful to understand and resolve this problem.

After serious consideration, I still can not find a proper sample. So would you please give me such a code segment?

assylias
  • 321,522
  • 82
  • 660
  • 783
newman
  • 499
  • 3
  • 10
  • Personally I would say that C3 is a rewording of C2. I think for this case code samples might not be too helpful. You can see examples of code properly synchronized to prove that synchronized code will remove any race conditions. But examples of what to synchronize and why will be specific to that example. I'd recommend more reading on general synchronization and multi-threadding. You can then take these concepts and apply them to Java. – Elliott Hill Aug 17 '12 at 13:35
  • You might find [this paper](https://www.google.com/url?sa=t&source=web&cd=20&ved=0CFkQFjAJOAo&url=http%3A%2F%2Frsim.cs.illinois.edu%2FPubs%2Fpopl05.pdf&ei=8GMtUL_VMo610QW72YHADA&usg=AFQjCNHpR-Nb-5UppQhqXtw5F2G224fa1g&sig2=q5y-tOEMLkHs9OK5Hy56Pg) interesting. – assylias Aug 17 '12 at 13:37
  • @newman I have reworded your question - feel free to rollback if you don't like my edit. – assylias Aug 17 '12 at 13:57

3 Answers3

5

A good example could be String's hashcode:

private int hash; // Default to 0

public int hashCode() {
    int h = hash;
    if (h == 0 && count > 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

There is a data race here as hash can be written and read by different threads and there is no happens-before relationship (no synchronization).

However the program is sequentially consistent because it is impossible for a thread to see a hashcode which is not the actual hashcode of the string (when a thread executes the hashcode method, it can either see 0 and recalculate the value, which is deterministic, or it sees a valid value). This works because int writes are atomic.

EDIT

This (almost) same code is broken and could return a hashcode of 0:

public int hashCode() {
    if (hash == 0 && count > 0) { //(1)
        int h = hash;
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return hash; //(2)
}

as (1) and (2) could be reordered: (1) could read a non null value while (2) would read 0. That can't happen in the first example because the calculation is made on the local variable and the return value is also that local variable, which, by definition, is thread safe.

EDIT 2

Regarding your proposition C4, I don't think it is possible:

A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.

If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).

So if a program is correctly synchronized:

  • all the executions appear sequentially consistent.
  • all sequentially consistent executions are free of data races

So we can conclude that all executions are free of data races and therefore the program is free of data races.

Community
  • 1
  • 1
assylias
  • 321,522
  • 82
  • 660
  • 783
  • 2
    +1 This actually has a term. It's known as a *benign data race* – John Vint Aug 17 '12 at 15:07
  • [This blog post](http://jeremymanson.blogspot.co.uk/2008/12/benign-data-races-in-java.html) by the guy who wrote Chapter 17 of the JLS is instructive too. – assylias Aug 17 '12 at 15:15
  • 1
    @JohnVint Yes - thanks for the precision. The blog I linked to is interesting - changing the condition to `if(hash == 0)` breaks the code... Don't play with fire unless you are a firefighter! – assylias Aug 17 '12 at 15:16
  • Nice read, I like that actual race condition you noted. Stuff gets tricky. – John Vint Aug 17 '12 at 15:20
  • @assylias, I have read blog you recommended. As my english is not good enough and I am a beginner in multiple-thread, I can not fully understand what did the blog say. The most difficult part to understand is following paragraph: What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations." – newman Aug 18 '12 at 03:39
  • @assylias,"The second read can actually be moved, in your code, so that your processor does it before the first!" So would you please introduce me little by little? I will also ask you question little by litte. My first question is: in broken code, before return, there is only one read of hash, that is "if(hash == 0)", so what's the second read of hash before return? – newman Aug 18 '12 at 03:40
  • @newman I have edited to show what he means. To be honest If you really are a beginner in java multi-threading, you should not worry about these very tricky things that most (99.99%) people should not try because it is too easy to get them wrong. You should only aim at avoiding data races, which gives you the guarantee that your program is sequentially consistent. – assylias Aug 18 '12 at 07:58
  • @assylias, I have make a further edit on your edit. I'm not sure whether it is correct or not. If it is not correct, please correct it. – newman Aug 18 '12 at 09:10
  • @assylias, I'm teamleader of a project, although there is only one member in this team now. So I have to understand much about multi-threading, otherwise, it is very difficult for my project to pass pressure test. – newman Aug 18 '12 at 09:19
  • @assylias, why do you rollback my comment? Would you please give me a explaination? – newman Aug 18 '12 at 09:24
  • @newman Your edit was rejected, but [not by me](http://stackoverflow.com/suggested-edits/356008) – assylias Aug 18 '12 at 09:38
  • @assylias, thanks for your link so I can have an opportunity to save my job. May I think that you agree with my comment? – newman Aug 18 '12 at 09:45
  • But in substance I agree. But once again, you should always prevent data races. Unless IF you detect a performance issue AND you know what you are doing, can you introduce benign races in your code. – assylias Aug 18 '12 at 09:46
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/15500/discussion-between-newman-and-assylias) – newman Aug 18 '12 at 10:05
  • @assylias.Before I can be sure whether race in my project is benign or not, I must understand something more about multi-thread. I'm not sure whether following data race is benign: there are two threads in my code, one thread write a field to indicate that "I will begin rolling back", another thread read this field and decide what will it do. I think whether this field is volatile or not is not important, because even this field is volatile, it is still possible for second thread to read this filed before this field is written by first thread. So what's your suggestion about my opinion? – newman Aug 18 '12 at 11:06
  • 1
    @newman This is a whole new question - I suggest you ask it as such + you need to give a bit more context. – assylias Aug 18 '12 at 12:21
  • @newman As the example in my answer shows, details do matter and your question is too general to be answered as it is. – assylias Aug 18 '12 at 12:29
  • @assylias, before I post a new topic, I want talk about your second code sample one stop further. In your second code sample, the second read was moved prior to first read. When this code was executed by first thread, it executes second read first, then it hangs up. Now second thread run this method too, because the execution of thread was not interrupted, it assign correct value to "hash". At this memont, the first thread comes to life, then first read reads correctly computed hash value. – newman Aug 18 '12 at 14:21
  • But it is already too late, because the second read has read "hash" as 0, so this method in first thread return 0. This is also true for this method in second thread. All methods run in following threads will return correctly computed hash value. I'm not sure whether my understanding is right or not, please give me advice. Thans a lot. – newman Aug 18 '12 at 14:21
  • 1
    assylias, the wording in your answer abuses the term "sequentially consistent", which has a very tight definition by the JLS, and is not a property of a program, but its execution. Any program can (and probably **will**) be executed in a sequentially inconsistent manner, which happens every time a compiler decides to reorder some write instructions. In addition, `String.hashCode` is an example that isn't even **correctly synchronized**. – Marko Topolnik Aug 19 '12 at 06:52
  • 1
    @MarkoTopolnik I should have said: all executions will appear to be sequentially consistent. As for hashcode not being synchronized, it was the point of my answer: it is not sync'ed, there is therefore a data race on hash and still all executions appear SC... although I don't really address the C4 of the question (I don't have the answer to that question). – assylias Aug 20 '12 at 07:47
  • 1
    The major point is that our OP here is not interested in program correctness, but wants to get to the bottom of the JMM formalism. That is why he is for example not interested whether `String.hashCode` is correct from program correctness standpoint. If this wasn't the case, you can be sure I wouldn't go nitpicking on you for a slight imprecision in terminology :) – Marko Topolnik Aug 20 '12 at 08:19
  • @MarkoTopolnik I was really answering the last part of his question *"I think a code segment that shows a sequentially consistent execution of a multi-threaded program that contains a data race is helpful to understand and resolve this problem."* – assylias Aug 20 '12 at 08:24
  • Feel free to add an answer if you feel that mine is incorrect. – assylias Aug 20 '12 at 08:25
  • OP has already posted a followup question and I have provided an answer there. It still all revolves about the exact wording of the JMM and is in fact a good practice for me to get all terms purified. – Marko Topolnik Aug 20 '12 at 08:37
  • 2
    RE "code segment that shows..." here the main message to OP should be that **there is no such thing** as a *code segment* that shows **any kind of** execution, sequentially consistent or not. Sequential consistency is orthogonal to code: **any code** can be executed both in a sequentially consistent and inconsintent manner, and this aspect is in no way under the control of the code, but of the Java runtime implementation. This is important to OP because he is trying to understand the way these terms are used by the JMM. – Marko Topolnik Aug 20 '12 at 08:40
  • @MarkoTopolnik I had not seen the follow-up. Now I understand your point. Note that I never said that a program could be correctly synchronized and have data races. Only that all executions could appear sequentially consistent, even with a data race (as reworded earlier today). Thanks. – assylias Aug 20 '12 at 08:45
  • 2
    I'm just trying to help OP because his knowledge is in a very delicate stage so 100% clarity of terminology is of essence. Note on your "EDIT 2: So if a program is correctly synchronized: all the executions are sequentially consistent." In fact, executions may well be inconsistent; but will **appear to be consistent**. This is a very important distinction for OP. Also, a program is not the entity that can contain data races: it is the property of an **execution** of that program. This is also very important to OP in order to match against other definitions in the JMM. – Marko Topolnik Aug 20 '12 at 08:56
  • @MarkoTopolnik Just commented on the 2nd part of your comment in the other thread (this is becoming messy!) – assylias Aug 20 '12 at 08:57
1

Race conditions mean to let your program's output depend on who gets at a specific point first. For instance, if you have 2 threads: T1 and T2, if your programs output is X if T1 gets to point P in the program first and the program's output is Y if T2 gets to point P first, then you have a race condition.

In pseudocode:

 Global variable i initialized to 6;
 Thread 1: 
       aquire(lock l)
             increment global variable i, i.e. i++;


 Thread 2: 
       aquire(lock l)
             double the value of global var i, i.e.: i*=2;

If T1 aquires the lock l firts and T2 second the value of i will be 14 If T2 aquires the lock l firts and T1 second the value of i will be 13

  • Data race free imply sequential consistency, but not vice versa. – newman Aug 17 '12 at 13:43
  • @Georghe your example is data race free because it uses a form of synchronization. – assylias Aug 17 '12 at 13:43
  • @assylias: the output of the program depends on which thread gets the lock first. Isn't this a race condition ? –  Aug 17 '12 at 13:45
  • JLS definition: *"When a program contains two conflicting accesses that are not ordered by a happens-before relationship, it is said to contain a data race."* In your case, there is a happens-before relationship between the 2 locks acquire. Obviously one could succeed before the other but that does not make it a data race. – assylias Aug 17 '12 at 13:52
  • I think maybe my expression is not exact. What I need a code segment is: it is a multiple-threaded program, a sequential consistent execution of this program contains data race. – newman Aug 17 '12 at 14:04
  • @newman: but the example is sequential consistent, isn't it ? I thought it also contained a race condition, but it seems this is not a race condition. –  Aug 17 '12 at 14:10
  • I agree with Gheorghe, because although two actions have happens-before relationship, they can not be ordered by this relationship, because you can not say which one is the first and which one is the second. – newman Aug 17 '12 at 14:14
  • @Gheorghe, could you please give me an explain about why this example is sequential consistent? – newman Aug 17 '12 at 14:20
  • because : "the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program". But I'm not sure ! –  Aug 17 '12 at 14:31
  • @Gheorghe, your explaination is from Lamport's paper. So would you please give me an explain about your sample code? For example: the result of any execution of my sample code ...... – newman Aug 17 '12 at 14:49
  • @Gheorghe, please don't feel strange, because all we discussed are subtle concepts, we need deal with them carefully. – newman Aug 17 '12 at 14:57
  • @Gheorghe, please don't feel strange, the more detailed what we express at here, the more clear what we can understand. – newman Aug 17 '12 at 15:05
  • 1
    @Gheorghe I think the problem comes from definitions. newman is specifically talking about data races, when you are talking about race conditions. These are 2 different concepts, hence the confusion. – assylias Aug 18 '12 at 09:21
0

http://rsim.cs.illinois.edu/Pubs/popl05.pdf

The proof seems to go this way:

Suppose an execution of a correctly synchronized program contains data race, we can find a sequential execution that contains data race, violating the definition of "correctly synchronized program" [C1]

Therefore the execution must not contain data race. [C3] From there, the execution can be proven to be sequentially consistent. [C2]

So C3 is true, and it is used to prove C2.

irreputable
  • 44,725
  • 9
  • 65
  • 93