6

I could find the answer if I read a complete chapter/book about multithreading, but I'd like a quicker answer. (I know this stackoverflow question is similar, but not sufficiently.)

Assume there is this class:

public class TestClass {
   private int someValue;

   public int getSomeValue() { return someValue; }
   public void setSomeValue(int value) {  someValue = value; }
}

There are two threads (A and B) that access the instance of this class. Consider the following sequence:

  1. A: getSomeValue()
  2. B: setSomeValue()
  3. A: getSomeValue()

If I'm right, someValue must be volatile, otherwise the 3rd step might not return the up-to-date value (because A may have a cached value). Is this correct?

Second scenario:

  1. B: setSomeValue()
  2. A: getSomeValue()

In this case, A will always get the correct value, because this is its first access so he can't have a cached value yet. Is this right?

If a class is accessed only in the second way, there is no need for volatile/synchronization, or is it?

Note that this example was simplified, and actually I'm wondering about particular member variables and methods in a complex class, and not about whole classes (i.e. which variables should be volatile or have synced access). The main point is: if more threads access certain data, is synchronized access needed by all means, or does it depend on the way (e.g. order) they access it?


After reading the comments, I try to present the source of my confusion with another example:

  1. From UI thread: threadA.start()
  2. threadA calls getSomeValue(), and informs the UI thread
  3. UI thread gets the message (in its message queue), so it calls: threadB.start()
  4. threadB calls setSomeValue(), and informs the UI thread
  5. UI thread gets the message, and informs threadA (in some way, e.g. message queue)
  6. threadA calls getSomeValue()

This is a totally synchronized structure, but why does this imply that threadA will get the most up-to-date value in step 6? (if someValue is not volatile, or not put into a monitor when accessed from anywhere)

Community
  • 1
  • 1
Thomas Calc
  • 2,994
  • 3
  • 30
  • 56

5 Answers5

3

If two threads are calling the same methods, you can't make any guarantees about the order that said methods are called. Consequently, your original premise, which depends on calling order, is invalid.

It's not about the order in which the methods are called; it's about synchronization. It's about using some mechanism to make one thread wait while the other fully completes its write operation. Once you've made the decision to have more than one thread, you must provide that synchronization mechanism to avoid data corruption.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
  • "*If two threads are calling the same methods, you can't make any guarantees about the order that said methods are called.*"--in my use cases,there is a guarantee.It's a well-defined process in my software, the threads use synchronization. So in my concrete context, the guaranteed order of operations follows from the software structure. The question is if these (correctly ordered) operations will see the correct value *without* using monitors/volatile.(A trivial example: thread A is not running when thread B calls getSomeValue(),e.g. thread A is started exactly fater getSomeValue() is called.) – Thomas Calc Jun 29 '12 at 22:39
  • I hear what you are saying, but I still contend that the order of operations doesn't matter; what matters is that the second thread waits for the first thread to complete its write operation. If your process guarantees that, then I think it could be considered threadsafe. – Robert Harvey Jun 29 '12 at 22:40
  • To put it another way, if you're already using synchronization mechanisms to guarantee thread safety (as you say you are), you don't have to think about guaranteeing order of operations. – Robert Harvey Jun 29 '12 at 22:41
  • 1
    I guarantee the order of operations, yes, but (without using volatile) in the first scenario, how can thread A know that the value changed since the last time he accessed it? I mean, the "software" itself knows about step 2, but *thread A* might not know. – Thomas Calc Jun 29 '12 at 22:44
  • I've added another example to the end of the main question post (to enable decent formatting). – Thomas Calc Jun 29 '12 at 22:54
2

As we all know, that its the crucial state of the data that we need to protect, and the atomic statements which govern the crucial state of the data must be Synchronized.

I had this example, where is used volatile, and then i used 2 threads which used to increment the value of a counter by 1 each time till 10000. So it must be a total of 20000. but to my surprise it didnt happened always.

Then i used synchronized keyword to make it work.

Synchronization makes sure that when a thread is accessing the synchronized method, no other thread is allowed to access this or any other synchronized method of that object, making sure that data corruption is not done.

Thread-Safe class means that it will maintain its correctness in the presence of the scheduling and interleaving of the underlining Runtime environment, without any thread-safe mechanism from the Client side, which access that class.

Kumar Vivek Mitra
  • 33,294
  • 6
  • 48
  • 75
2

Let's look at the book.

A field may be declared volatile, in which case the Java memory model (§17) ensures that all threads see a consistent value for the variable.

So volatile is a guarantee that the declared variable won't be copied into thread local storage, which is otherwise allowed. It's further explained that this is an intentional alternative to locking for very simple kinds of synchronized access to shared storage.

Also see this earlier article, which explains that int access is necessarily atomic (but not double or long).

These together mean that if your int field is declared volatile then no locks are necessary to guarantee atomicity: you will always see a value that was last written to the memory location, not some confused value resulting from a half-complete write (as is possible with double or long).

However you seem to imply that your getters and setters themselves are atomic. This is not guaranteed. The JVM can interrupt execution at intermediate points of during the call or return sequence. In this example, this has no consequences. But if the calls had side effects, e.g. setSomeValue(++val), then you would have a different story.

Community
  • 1
  • 1
Gene
  • 46,253
  • 4
  • 58
  • 96
1

The issue is that java is simply a specification. There are many JVM implementations and examples of physical operating environments. On any given combination an an action may be safe or unsafe. For instance On single processor systems the volatile keyword in your example is probably completely unnecessary. Since the writers of the memory and language specifications can't reasonably account for possible sets of operating conditions, they choose to white-list certain patterns that are guaranteed to work on all compliant implementations. Adhering to to these guidelines ensures both that your code will work on your target system and that it will be reasonably portable.

In this case "caching" typically refers to activity that is going on at the hardware level. There are certain events that occur in java that cause cores on a multi processor systems to "Synchronize" their caches. Accesses to volatile variables are an example of this, synchronized blocks are another. Imagine a scenario where these two threads X and Y are scheduled to run on different processors.

X starts and is scheduled on proc 1
y starts and is scheduled on proc 2

.. now you have two threads executing simultaneously
to speed things up the processors check local caches
before going to main memory because its expensive.

x calls setSomeValue('x-value') //assuming proc 1's cache is empty the cache is set
                                //this value is dropped on the bus to be flushed
                                //to main memory
                                //now all get's will retrieve from cache instead
                                //of engaging the memory bus to go to main memory 
y calls setSomeValue('y-value') //same thing happens for proc 2

//Now in this situation depending on to order in which things are scheduled and
//what thread you are calling from calls to getSomeValue() may return 'x-value' or
//'y-value. The results are completely unpredictable.  

The point is that volatile(on compliant implementations) ensures that ordered writes will always be flushed to main memory and that other processor's caches will be flagged as 'dirty' before the next access regardless of the thread from which that access occurs.

disclaimer: volatile DOES NOT LOCK. This is important especially in the following case:

volatile int counter;

public incrementSomeValue(){
    counter++; // Bad thread juju - this is at least three instructions 
               // read - increment - write             
               // there is no guarantee that this operation is atomic
}

this could be relevant to your question if your intent is that setSomeValue must always be called before getSomeValue

If the intent is that getSomeValue() must always reflect the most recent call to setSomeValue() then this is a good place for the use of the volatile keyword. Just remember that without it there is no guarantee that getSomeValue() will reflect to most recent call to setSomeValue() even if setSomeValue() was scheduled first.

nsfyn55
  • 14,875
  • 8
  • 50
  • 77
  • So if I want to be safe in every environment (obviously, this is a must), then multithreaded access on variables must be synchronized *regardless of the order of operations* (i.e. regardless of the fact that the operations themselves are synchronized)? I.e. even if the order of operations strictly follows from my software structure (as in the example at the end of my post),there should still be lower-level synchronization (it's actually not "synchronization" but a way to ensure that variable copies are updated wherever needed -- in Java, a proper synchronized block meets this requirement too). – Thomas Calc Jun 29 '12 at 23:14
  • To be brief, variables that are accessed from more threads (**even if the threads wait for each others' write operations to complete**, as Robert Harvey mentioned) should be accessed in a way that enforces the system to update any cached copies. Is this correct? – Thomas Calc Jun 29 '12 at 23:15
  • it really depends on the context. Genearlly speaking, access to shared state should be synchronized with the exception of a few corner cases. One notable example of a failure is the entrance to a loop where both parties rely on updates to shared variable to exit. this can result in an unintended infinite loop on multi-core systems where each thread executes the loop on its own copy. Experienced concurrent developers go to extraordinary lengths(defensive copies, immutable objects, quasi functional programming, etc) to avoid having shared state at all so to avoid the need for synchronization. – nsfyn55 Jun 30 '12 at 03:35
1

If I'm right, someValue must be volatile, otherwise the 3rd step might not return the up-to-date value (because A may have a cached value). Is this correct?

If thread B calls setSomeValue(), you need some sort of synchronization to ensure that thread A can read that value. volatile won't accomplish this on its own, and neither will making the methods synchronized. The code that does this is ultimately whatever synchronization code you added that made sure that A: getSomeValue() happens after B: setSomeValue(). If, as you suggest, you used a message queue to synchronize threads, this happens because the memory changes made by thread A became visible to thread B once thread B acquired the lock on your message queue.

If a class is accessed only in the second way, there is no need for volatile/synchronization, or is it?

If you are really doing your own synchronization then it doesn't sound like you care whether these classes are thread-safe. Be sure that you aren't accessing them from more than one thread at the same time though; otherwise, any methods that aren't atomic (assiging an int is) may lead to you to be in an unpredictable state. One common pattern is to put the shared state into an immutable object so that you are sure that the receiving thread isn't calling any setters.

If you do have a class that you want to be updated and read from multiple threads, I'd probably do the simplest thing to start, which is often to synchronize all public methods. If you really believe this to be a bottleneck, you could look into some of the more complex locking mechanisms in Java.

So what does volatile guarantee?

For the exact semantics, you might have to go read tutorials, but one way to summarize it is that 1) any memory changes made by the last thread to access the volatile will be visible to the current thread accessing the volatile, and 2) that accessing the volatile is atomic (it won't be a partially constructed object, or a partially assigned double or long).

Synchronized blocks have analogous properties: 1) any memory changes made by the last thread to access to the lock will be visible to this thread, and 2) the changes made within the block are performed atomically with respect to other synchronized blocks

(1) means any memory changes, not just changes to the volatile (we're talking post JDK 1.5) or within the synchronized block. This is what people mean when they refer to ordering, and this is accomplished in different ways on different chip architectures, often by using memory barriers.

Also, in the case of synchronous blocks (2) only guarantees that you won't see inconsistent values if you are within another block synchronized on the same lock. It's usually a good idea to synchronize all access to shared variables, unless you really know what you are doing.

nas
  • 957
  • 1
  • 9
  • 21