0

There is a read intensive use case where multiple threads are reading from the HashMap and every 30 mins lets say it expires and update the whole map. The max size of the map would be in range from 2MB to 200MB.

So, currently the solution that I am thinking is to have a HashMap with multiple readers and once it expires, a daemon thread will fetch the data from the datasource and create a new map and once its done will take the lock on the old HashMap and then copy the newly created map to the old one. Is it correct approach and if yes, there a better approach and if not, what is correct approach. Will copying the data to new map take more time?

The aim is to serve maximum read requests.

Gray
  • 115,027
  • 24
  • 293
  • 354
  • 1
    See http://stackoverflow.com/questions/104184/is-it-safe-to-get-values-from-a-java-util-hashmap-from-multiple-threads-no-modi . – gsl Feb 01 '17 at 17:02

2 Answers2

0

First, pay attention to the linked question (from comments above) and its answers. Repeated here: Is it safe to get values from a java.util.HashMap from multiple threads (no modification)?

Since you're already building a new (and apparently complete) map to replace the old one, don't update the existing hashmap in place. That would be slower and access to map would be blocked while you update it.

Simply replace the old map with the new one:

public class HashMapAccessController
{
    protected HashMap map = null;

    // version - increment this on each update
    // (assuming generation of a new map version
    //  takes measurable time, rollover is a
    //  problem for the next civilization...)
    protected long version = 0;

    public HashMapAccess( HashMap newMap )
    {
        map = newMap;
    }

    public synchronized long getVersion()
    {
        return( version );
    }

    synchronized HashMap getMap()
    {
        return( map );
    }

    synchronized HashMap updateMap( HashMap newMap )
    {
        version++;
        HashMap oldMap = map;
        map = newMap;
        return( oldMap );
    }
}

Just make sure you ONLY read from any map returned from getMap() and never try to update it. Again, see the linked question: Is it safe to get values from a java.util.HashMap from multiple threads (no modification)?

The only downside to this is that a thread can get the old map, and have it replaced in the access controller object while that thread is still using the old map. If you have to require that all accesses to hash map data once you generate the new map MUST use only data from the new map, then this approach won't work. Then you will have to lock the entire map and update it in place. That will be a lot slower.

Updating a hash map in place is tedious. For one, how do you ensure entries in the old map that aren't in the new map are deleted?

Community
  • 1
  • 1
Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • @JonathanRosenne The question stated `HashMap`, so... The access control object should probably just use `Map`. Heck, it could just use `Object`. – Andrew Henle Feb 01 '17 at 17:33
  • @AndrewHenle : Thanks, Yeah, that downside is understood and acceptable. Thanks again. – user3331132 Feb 01 '17 at 18:10
  • @JonathanRosenne : I have not done POC with treemap, but I suspect as the size is limited, chances of collision is less and also I do not need any order, hashmap might give the better performance. I will try the treemap though. – user3331132 Feb 01 '17 at 18:19
  • @AndrewHenle : Is there a better way to solve this problem? Periodically, I will get data from the datasource with the updated(deleted, new, existing) records, and then I need to serve read requests with as less stale data possible without impacting read performance. – user3331132 Feb 01 '17 at 18:32
  • @user3331132 How long does your application keep references to your HashMap around? Once you get your new HashMap built, it'll only take a few nanoseconds to swap the old reference for the new. For a hash map of a few hundred MB, updating it could take perhaps tens of seconds - and readers would be locked out for the entire time. Look at it this way: if an action you want based on new data had started just a few milliseconds earlier, it would have used the old data and you would think it perfectly fine. Do you really care that a last few actions might use the previous version of the data? – Andrew Henle Feb 01 '17 at 19:52
  • @user3331132 I added a `long` version number - your processes can check as needed to see if the map changes and a new version of the data is available. Note that at one map update per second, it would take over 200 million years for a 64-bit signed integer value to roll over to a negative value. Your process will not run that long. You can also add other forms of notification if you need to get positive notification of a version change. – Andrew Henle Feb 01 '17 at 20:03
0

So, currently the solution that I am thinking is to have a hashmap with multiple readers and once it expires, a daemon thread will fetch the data from the datasource and create a new map and once its done will take the lock on the old HashMap and then copy the newly created map to the old one.

That sounds right. If, as you imply in your message, no one is modifying the HashMap after it has been created, you can safely use it with multiple threads as long as you share it correctly. To ensure immutability, you should wrap the map using Collection.unmodifiableMap(map).

To share the map with the threads, you will need to make it a volatile field that all of the threads access. Something like:

protected volatile HashMap map = null;

With it being volatile then you don't need to do any locking. You update method then looks like:

// no need to have synchronized here
HashMap updateMap( HashMap newMap ) {
   HashMap oldMap = this.map;
   this.map = Collection.unmodifiableMap(newMap);
   return oldMap;
}

None of your other methods need to be synchronized as well. volatile fields will perform a lot better because the threads will only cross a read memory barrier when they access the shared map and only a write memory barrier when it is updated. With a synchronized keyword, the threads cross a read and write barrier every time. Even worse, the synchronized keyword has lock overhead that insures mutex which you don't need.

Will copying the data to new map take more time?

Copying the data to the new map will take time but no more than a typical copying of data between HashMaps. What is different is that the volatile field access can be much slower than direct field access because of the memory barriers that are crossed.

Gray
  • 115,027
  • 24
  • 293
  • 354