2

I'm trying to implement a database backed java.util.Map, most of the interface like put and get was easily implemented however I am having trouble figuring out the best way to implement:

    @Override
    public Set<K> keySet() {
          // TODO Auto-generated method stub
           return null;
    }

    @Override
    public Collection<V> values() {
         // TODO Auto-generated method stub
         return null;
    }

    @Override
         public Set<Map.Entry<K, V>> entrySet() {
        // TODO Auto-generated method stub
        return null;
    } 

My concern would be that keys and values could count to millions records. So I don't think its memory and cpu efficient to fetch and store all "keys" or "values" when these methods are accessed.

What are the options to implement a memory efficient way to implementing these?

What is the strategy to implement an iterator for the entrySet?

ApproachingDarknessFish
  • 14,133
  • 7
  • 40
  • 79
  • 1
    You are not the only person to have ever have needed a memory efficient Map, I wouldn't re-invent the wheel and invent your own. First, try using one of the standard types like `HashMap`, then IF you see a performance issue look for an implementation in a library somewhere. See http://stackoverflow.com/questions/3972127/hashmap-alternatives-for-memory-efficient-data-storage – luketorjussen Apr 02 '13 at 21:20
  • So in other words you have it all figured out other than how to put the data in and get it out? (First you have to settle on a database organization, then work out a caching scheme. Hardest is knowing when you need to go search the DB for a find operation -- you need to have some sort of scheme for caching the DB key values.) – Hot Licks Apr 02 '13 at 21:39
  • You don't have to build the complete set in memory, you just have to provide iterators that walk through the database. – Louis Wasserman Apr 02 '13 at 21:44

3 Answers3

0

Honestly It looks like to do it the best way possible you will also have to implement Set and Collection in such a way that it uses an efficient method to retrieve those values, and does not try to pull the entire database into memory, and return an instance of that implemented Set or Collection interface.

Ryan
  • 2,755
  • 16
  • 30
0

I recommend using Oracle's BerkeleyDB Java Edition. The com.sleepycat.collections.StoredContainer.StoredMap class implements the java.util.Map interface and will also backup data to disk. I have used it to work with maps with about 8GB data.

StoredMap: http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/collections/StoredMap.html

BerkeleyDB Java Edition: http://www.oracle.com/technetwork/database/berkeleydb/overview/index-093405.html

mvarshney
  • 364
  • 1
  • 8
0

If the total data volume is big (Gigabytes), it might be worthwhile to move the data off-heap, to avoid long GC pauses. As a real-world example, see this post: Going off-heap to improve latency and reduce AWS bill.

leventov
  • 14,760
  • 11
  • 69
  • 98