5

Background

I have a large data map (HashMap), kept in memory, which is updated incrementally (based on incoming messages), by the background thread:

<KEY> => <VALUE>
...

End users will then query it via the REST API:

GET /lookup?key=<KEY>

Updates are not applied immediately, but in batches, once a special control message is received, i.e.

MESSAGE: "Add A" 

A=<VALUE>   //Not visible yet

MESSAGE: "Add B"

B=<VALUE>   //Not visible yet

MESSAGE: "Commit"

//Updates are now visible to the end-users
A=<VALUE>
B=<VALUE

The architecture I devised, is as follows:

volatile Map passiveCopy = new HashMap();
volatile Map activeCopy = new HashMap();

Map<String,Object> pendingUpdates; 

//Interactive requests (REST API)
Object lookup(String key) {
     activeCopy.get(key);
}

//Background thread processing the incoming messages.
//Messages are processed strictly sequentially
//i.e. no other message will be processed, until
//current handleMessage() invocation is completed
//(that is guaranteed by the message processing framework itself)
void handleMessage(Message msg) {

   //New updates go to the pending updates temporary map
   if(msg.type() == ADD) {
      pendingUpdates.put(msg.getKey(),msg.getValue()); 
   }


   if(msg.type() == COMMIT) {     
      //Apply updates to the passive copy of the map
      passiveCopy.addAll(pendingUpdates);

      //Swap active and passive map copies
      Map old = activeCopy; 
      activeCopy = passiveCopy;
      passiveCopy = old;

      //Grace period, wait for on-the-air requests to complete
      //REST API has a hard timeout of 100ms, so no client
      //will wait for the response longer than that 
      Thread.sleep(1000);

      //Re-apply updates to the now-passive (ex-active) copy of the map
      passiveCopy.addAll(pendingUpdates);

      //Reset the pendingUpdates map
      pendingUpdates.clear();
   }

}

The question

Taking that write->read to the volatile field makes a happens-before edge:

A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.

https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.5

and the grace period is chosen correctly, I expect that any updates applied to the passiveCopy (via putAll()), will become visible to the end-user requests (all at once), after the swap.

It is really a case, or there are any corner-case which will make this approach fail ?

NOTE

I know that creating a copy of the Map (so that a new Map instance is assigned to activeCopy an each time), would be safe to do, but I don't want to do this (as it is really large).

bitbit9
  • 53
  • 4
  • You are swapping the maps, but do you not want to values from the original map in the new map as well? And when you commit again you will get these values back you have "deleted" previously. This sounds weird. Also, please [edit] your question to include a description of how the `volatile` keyword will help you and what you think will happen when you don't use the `volatile` keyword. – Progman Jun 20 '19 at 07:07
  • @Progman Yep, thanks, you are correct. Updates should be buffered and re-applied to the ex-active copy of the map, after the swap. I've updated the code to reflect that (I've actually over-simplified the code trying to get the minimum possible example). – bitbit9 Jun 20 '19 at 08:17
  • > Also, please edit your question to include a description of how the volatile keyword will help you --> "I expect that any updates applied to the passiveMap (via putAll()), will become visible to the end-user requests (all at once), after the swap" <--- Not using the volatile will/can obviously prevent users accessing the ```activeMap``` from seeing the updates just applied to it (or not seeing all of them). – bitbit9 Jun 20 '19 at 08:19

2 Answers2

1

Apart from your inconsistent use of activeMap and activeCopy (just remove activeCopy and only swap between activeMap and passiveCopy), your approach is sensible.

This answer quotes the JLS:

If x and y are actions of the same thread and x comes before y in program order, then hb(x,y) [x "happens before" y].

An example is also given in this answer.

From that I take that accesses to a volatile variable/field are basically sequence points; in your case, because the swap comes after the modification of the map in the program code, it should be guaranteed that the modification of the map is completed before the access to the volatile field is actually performed. So no race condition here.

However, in most cases you should use synchronized or explicit locks to synchronize concurrent executions. The only reason to code around using these is if you need high performance, i.e. massive parallelism, where it's either not acceptable for threads to block a lock, or the desired parallelism is so high that threads begin to starve.

That said, I believe you should really just 'invest' in proper mutual exclusion, preferredly using a ReadWriteLock. Because synchronized (which is used by ReadWriteLock internally) implies a memory barrier, you don't need volatile anymore.

For example:

final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
final Lock readLock = rwLock.getReadLock();
final Lock writeLock = rwLock.getWriteLock();

Map passiveCopy = new HashMap();
Map activeMap = new HashMap();

final Map<String,Object> pendingUpdates = new HashMap(); 

//Interactive requests (REST API)
Object lookup(String key) {

  readLock.lock();
  try {
     return activeMap.get(key);
  } finally {
    readLock.unlock();
  }
}

//Background thread processing the incoming messages.
//Messages are processed strictly sequentially
//i.e. no other message will be processed, until
//current handleMessage() invocation is completed
//(that is guaranteed by the message processing framework itself)
void handleMessage(Message msg) {

   //New updates go to the pending updates temporary map
   if(msg.type() == ADD) {
      pendingUpdates.put(msg.getKey(),msg.getValue()); 
   }


   if(msg.type() == COMMIT) {     
      //Apply updates to the passive copy of the map
      passiveCopy.addAll(pendingUpdates);

      final Map tempMap = passiveCopy;    

      writeLock.lock();

      try {
        passiveCopy = activeMap;
        activeMap = tempMap;
      } finally {
        writeLock.unlock();
      }

      // Update the now-passive copy to the same state as the active map:
      passiveCopy.addAll(pendingUpdates);
      pendingUpdates.clear();
   }

}

From your code, however, I read that 'readers' should see a consistent version of the map during their 'lifetime', which is not guaranteed by the above code, i.e. if a single 'reader' accesses the map twice he may see two different maps. This could be solved by having each reader acquire the read lock itself before the first access to the map, releasing it after the last access to the map. This may or may not work in your case because if the readers hold the lock for extended periods, or there are many reader threads, it may block/starve the writer thread trying to commit the update.

JimmyB
  • 12,101
  • 2
  • 28
  • 44
  • >just remove activeCopy and only swap between activeMap and passiveCopy True, thank you, I've edited to code to remove it. – bitbit9 Jun 21 '19 at 18:18
  • > a single 'reader' accesses the map twice he may see two different maps That is true, as well, but that is not really my concern, as I only need a forward consistency in this case, basically newly added items can refer to other items in the same batch, so I need a guarantee that if you request A1 you will be able to retrieve A2 with your next request (so I want A1 and A2 to be made visible "atomically") – bitbit9 Jun 21 '19 at 18:29
  • @bitbit9 This makes things easier and you can just use the volatile activeCopy as you intended. - But then I don't get what that "grace period' is for. – JimmyB Jun 22 '19 at 00:34
  • Thanks, that confirms my understanding of this. The grace period is here to avoid the (hypothetical) race, when reading process will obtain the activeCopy reference, just before it is swapped, and will execute get() on that reference concurrently with putAll() which follows the swap. @JimmyB – bitbit9 Jun 22 '19 at 07:21
  • @bitbit9 Ok, I overlooked that possible race condition. For this case I strongly recommend to use proper mutual exclusion as in my answer instead of relying on timing and introducing (most of the time) unnecessary delays. – JimmyB Jun 24 '19 at 08:08
0

The volatile Map will be a problem if you need the new entries to be added atomic so the user will never see a state where not all of them are added but only some of them.

The problem is that in java volatile for references just assures the following:

  • It's guaranteed, that the reference is allways up to date and all changes will be visible from any thread
  • It's NOT guaranteed that the content of the referenced object is allways up to date

(found in this book)

I also checked the implementation of the class HashMap (assuming that you use a HashMap), where you can see that the method putAll(Map) just calls the method putMapEntries(Map, boolean) which is implemented like this:

/**
 * Implements Map.putAll and Map constructor
 *
 * @param m the map
 * @param evict false when initially constructing this map, else
 * true (relayed to method afterNodeInsertion).
 */
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > 0) {
        if (table == null) { // pre-size
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        else if (s > threshold)
            resize();
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

So you see the method just calls the method putVal(int, K, V, boolean, boolean) in a for loop (which is not an atomic update). This means there is no real difference between adding all entries using putAll(Map) and using a for loop to add the entries one after another using put(K, V).

Conclusion: If you need to be shure that there is no possible state where a user can read a map with only some of the new elements added and some not added volatile can NOT be used here. So (like you already mentioned) creating a copy of the map and exchanging it will be better (and save). Although it uses twice as much memory, but it will be faster because volatile variables are usually realy slow.

Tobias
  • 2,547
  • 3
  • 14
  • 29
  • Thanks, I'm aware that putAll is not an atomic operation, which is why I invoke it _before_ I swap the maps, i.e. it is guaranteed to complete, before client reads a new active map reference from the volatile variable. – bitbit9 Jun 21 '19 at 18:14
  • 1
    > It's NOT guaranteed that the content of the referenced object That is true, if you modify this object, after the volatile reference is read. But current version of JSL speaks of a "happens-before" edge, between writing and reading a volatile variable, which assumes that anything you do to this object, before you write it to the volatile field, still should be visible to the reader (or at least that is how I read it). – bitbit9 Jun 21 '19 at 18:33