Friday, March 20, 2015

Volatile: Atomicity, Visibility and Ordering

Disclaimer: I don't claim to be an expert in this field but I will try to explain the concepts in my capability. If you feel something is wrong or can be improved please drop an comment. I will be more than happy to accept and include your suggestions as it will help all of us. Thanks.

You may be wondering why do we need one more article for exploring volatile? Well reason is: because it is probably the most confused and not-well-understood construct in concurrency. I have seen so many blogs explaining about volatile but still most of them feel incomplete or too difficult to understand. I will start with some of the most important aspects in concurrency:

Atomicity is all about indivisible operations i.e. they will either happen completely or will not happen at all. The best example of atomic operation in Java is assignment of a value to a variable.

Visibility is about one aspect: whether the changes (or effects) made by one thread to a shared variable will be visible to other threads or not?

Ordering is all about whether the order of instructions in source code can be altered by the compiler in name of optimization or not. There is a possibility that the actions in one thread can occur out of order with respect to another.

Now consider certain examples to check out these aspects.
public class MyApp
    private int count = 0;
    public void upateVisitors() 
       ++count; //increment the visitors count
Hint: read-modify-write

The sample code has a method that tries to update the number of visitors to an application (web page). The problem with this code is the instruction ++count is not atomic. It is composed of three separate instructions:

temp = count;   (read)
temp = temp + 1;   (modify)
count = temp;  (write)

So this instruction can be pre-empted by another thread while one thread is executing this. So this is not atomic instruction. Suppose value is 10. Consider the following sequence of execution:

Thread OneThread Two
temp = 10;
temp = 10 + 1= 11
temp = 10 ( as count is still 10)
temp = temp + 1 = 11
count = 11 (temp is 11)
count = 11 (temp is 11)

Here we should observe one thing: with some unlucky timing each thread read the same value (10) and add one to it and then each set the counter to 11. So an increment got lost along the way. And this possibility when the actual output depends on the thread interleaving is known as race condition. So which aspects of concurrency are missing here? What about lack of atomicity? Consider one more example of creating singleton (bad way of course):
public Singleton getInstance()
   if(_instance == null)
      _instance = new Singleton();
Hint: check-then-act

Now again there is a possibility that two threads may notice that instance is null and then both can enter into if block. This will lead to creation of two instances. Here again the problem is the if block is not atomic and also changes in instance are not visible to other threads. So this section which must not be executed by more than on thread at same time is known as Critical Section. So we defnitely need to control access to critical section and for that we can use synchronized block and methods.

Atomicity Again
To ensure atomicity we generally make use of locking to ensure mutual exclusion. Consider the following example of a bank account using synchronized methods:
class BankAccount {
 private int accountBalance;
 synchronized int getAccountBalance() {
    return accountBalance;  
 synchronized void setAccountBalance(int b) throws IllegalStateException {
    accountBalance = b;
    if (accountBalance < 0) {
     throw new IllegalStateException("Sorry but account has negative Balance");
 void depositMoney(int amount) {
    int balance = getAccountBalance();
    setAccountBalance(balance + amount);
 void withdrawMoney(int amount) {
    int balance = getAccountBalance();
    setAccountBalance(balance - amount);

All the access to the shared variable balance are guarded by locks so no problem of data race. Is there anything wrong with this class? Well yes there is. Suppose one thread calls depositMoney(50) and another thread calls withdrawMoney(50) and there is an initial balance of 100. Ideally there should be a balance of 100. But that is not guaranteed:
  • The method depositMoney sees a value of 100 for balance.
  • Then withdrawMoney method sees a value of 100 for balance and withdraws 50 and leaves a balance of 50.
  • Finally the method depositMoney uses the balance it saw previously to calculate a new balance of 150. Note one thing that the change in balance is not visible to it.
So again due to lack of atomicity an update is lost. If both of the methods are declared synchronized the lock will be ensured for entire duration of method and changes will take place atomically.

Visibility Again
If action of one thread is visible to another thread then the result of all its actions are observed by other threads as well. Consider the following example:
public class LooperThread extends Thread
    private boolean isDone = false;
    public void run() 
       while( !isDone ) {
    public void stopWork() {
       isDone = true;

What is missing here? Assume an instance of LooperThread is running while main thread calls stopWork() to stop it. There is no synchronization between the two threads. The compiler may detect that no writes are performed to isDone in the first thread and may decide to read isDone only once and then BOOM!! Some JVMs may do this and may transform it into infinite loop. The problem is clearly lack of visibility.

Ordering Again
Ordering is all about the order in which things are seen to occur. Consider the following example:

Initialization (Both not volatile)Thread One (not synchronized)Thread Two (not synchronized)
int value = 0value = 1if( result == true)
boolean result = falseresult = true;sysout(“value = “ + value)

In the above scenario is it possible for thread two to print value = 0?  Well Yes it is possible. It is possible that result=true may come before value=1 in compiler reordering. Also value=1 might not become visible in thread two and then thread two will load value=0. What can be done to fix it? If we make value volatile will this fix the problem?

CPU Architecture (multi level of RAMs)
CPUs generally have multiple cores now a days and threads will be running on different cores. Also there are different level of cache memories as demonstrated in the diagram below:

When a volatile is written by any thread in a particular core that value needs to be updated for all other cores as well because each core has its own cache that will have stale value of the variable. The messages are passed to all cores to update the values. The local caches of all cores are not flushed as such because it is done by cache coherence protocol and it is hardware specific.

Volatile (a younger sibling of synchronized block)
As per Java documentation if a variable is declared volatile then Java Memory Model (after JDK 5) ensures that all threads see a consistent value for the variable. Volatile is a cousin of synchronization in a way that reading a volatile is like entering into synchronized block and writing a volatile variable is like exiting from it. When a volatile is written the value is written to main memory and not to local processor cache and all the other caches of other cores are informed of this change by message passing. Volatile is not atomic.
Volatile provides ordering and visibility and does not provide mutual exclusion or atomicity guarantee. Locking guarantees atomicity, visibility and ordering. So volatile is not a substitute for synchronization.
Volatile read and writes
Volatile provided ordering guarantee and it means the generated instructions by compiler cannot have action results in different order other than the order defined by instructions of actual source code. Though the order of generated instructions can be different from original order of source code but resultant effect has to be in same order. We need to also observe the following point regarding read and write from Java Doc:
when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.

We need to understand the following points regarding the read and writes of a volatile:

  • When one thread writes to a volatile variable, and another thread sees that write, the first thread is telling the second about all of the contents of memory up until it performed the write to that volatile variable.
  • Here, Thread 2 sees the contents of all the threads as seen by them before they are going to write to ready. So the contents of memory of all those threads would be visible to Thread 2, after it reads the value true for ready.
Image is taken from Jeremy's blog.

Can we declare a final volatile volatile?
What do you think? If a variable is final we cannot change its value and volatile is all about making sure changes to a share variable are visible to other threads. So it is not allowed and will result in compilation error.

Why do we declare long/double volatile in concurrent programming?
By default reading/writing long or double variables is not atomic in nature. Non-volatile long or double is treated as two separate writes: one to each 32-bit half. It can result in a situation where a thread sees the first 32 bits of a 64 bit value from one write and the second sees 32 bits from another write. Write and read of volatile long and double are always atomic.

Now consider the first example again where we have marked the counter volatile:

public class MyApp
    private volatile int count = 0;
    public void upateVisitors() 
       ++count; //increment the visitors count

Before coming to any conclusion the point to remember is "Volatile guarantee visibility and ordering only". So what do you think? Will this example fail at all?

Volatile vs Atomic classes
If we make the count variable atomic will this work? Yes it will work and its better to make use of Atomic classes for incrementing or decrementing a primitive. AtomicInteger actually makes use of volatile and CAS (Compare And Sweep) for thread-safe implementation of integer counter. Also check this post to read about LongAdder.

So I finally close this topic. I hope you enjoyed the post.


Anonymous said...

very good..

Anonymous said...

good piece of reading but unfortunatelly some common fallacy regarding CPU caching appears. L1/L2/L3 CPU caches have nothing to do with volataile and visibility issues in java. These caches do not flush and you can not get stale data from them. Cache coherency protocol takes care of that. What your article is missing are MOBs (memory ordering buffers). They are internal parts of each core and are what you are referring to as caches. There is sth like memory barriers which are special CPU instructions which causes these buffers to get synchronized with CPU caches

Anonymous said...

Very nice.. Thanks.

Unknown said...

Nice post.

prasenjit said...

Nicely explained . Thanks alot.