18

I have a hashmap in Java that I need to limit in size (order of 50000). But I should delete only items that are the oldest. The timestamp of the item is stored in the entry object's field:

Map<String, MyModel> snapshot = new  HashMap<>();

and

public class MyModel { 
    private ZonedDateTime createdAt;
    // other fields...
}

I also insert them into the map in order by that timestamp.

What would be the most effective way to accomplish this kind of deletion of oldest entries? Note that "threshold" in time is not known, only the desired final size of the Map.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
onkami
  • 8,791
  • 17
  • 90
  • 176
  • Do you add items to the map in timestamp order? – T.J. Crowder Dec 16 '16 at 13:04
  • @T.J.Crowder yes I do – onkami Dec 16 '16 at 13:27
  • 1
    Then I believe [Boris' answer](http://stackoverflow.com/a/41185016/157247) is the most effective way to do this, or at least the `LinkedHashMap` he points to whether or not you use the `removeEldestEntry` or just remove entries directly (it has a way to tell you what the oldest key is). – T.J. Crowder Dec 16 '16 at 13:29

4 Answers4

49

HashMap has no "oldest", it has no "first", it has no order.

A LinkedHashMap on the other hand is designed for exactly this, it maintains a doubly linked list between the entries so keep them in insertion order, it also provides a removeEldestEntry method:

public static void main(final String args[]) throws Exception {
    final int maxSize = 4;
    final LinkedHashMap<String, String> cache = new LinkedHashMap<String, String>() {
        @Override
        protected boolean removeEldestEntry(final Map.Entry eldest) {
            return size() > maxSize;
        }
    };

    cache.put("A", "A");
    System.out.println(cache);
    cache.put("B", "A");
    System.out.println(cache);
    cache.put("C", "A");
    System.out.println(cache);
    cache.put("D", "A");
    System.out.println(cache);
    cache.put("E", "A");
    System.out.println(cache);
    cache.put("F", "A");
    System.out.println(cache);
    cache.put("G", "A");
}

Output:

{A=A}
{A=A, B=A}
{A=A, B=A, C=A}
{A=A, B=A, C=A, D=A}
{B=A, C=A, D=A, E=A}
{C=A, D=A, E=A, F=A}

Large Health Warning

Note that this implementation is not synchronized. If multiple threads access a linked hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

Map m = Collections.synchronizedMap(new LinkedHashMap(...));

LinkedHashMap JavaDoc

Community
  • 1
  • 1
Boris the Spider
  • 59,842
  • 6
  • 106
  • 166
  • 1
    Well, it's definitely on the object, but I've asked the OP to clarify whether they also happen to be inserted in that order... – T.J. Crowder Dec 16 '16 at 13:05
  • 1
    I was wondering about one of the `SortedMap`s as well. Seems no more (probably less) complicated than maintaining a separate list. I'd probably peek at `TreeMap`'s interals to see how efficient it is, but that seems like a good idea. – T.J. Crowder Dec 16 '16 at 13:13
  • 1
    OP has confirmed that they **are** inserting in timestamp order! – T.J. Crowder Dec 16 '16 at 13:30
1

It may be easiest just to add the String objects to a list whenever you put something in the map. Then you could do:

while(map.size()>50000){
    map.remove(list.get(0))
    list.remove(0);
}

This works because you don't actually care about the time, just the order.

A queue would be better than a list in this regard as you don't need anything other than accessing and removing the first element

Steven Waterman
  • 611
  • 4
  • 17
0

Simply spoken: then a HashMap won't do. Besides the obvious: you have iterate all the values, check that property; to then decide which keys you intend to remove.

In other words: a HashMap has only that one responsibility: mapping keys to values. It doesn't care about insertion order, insertion time, or frequency of accesses to keys. In that sense: you should look into using other kinds of implementations of the Map interface.

One alternative would be using a TreeSet and a customer comparator that automatically sorts based on those timestimamps.

But keep in mind: there are only two hard things in computer science:

  1. naming
  2. cache invalidation
GhostCat
  • 137,827
  • 25
  • 176
  • 248
  • The values in his map contains the time. He doesn't need the map to take care of insertion times and such. – marstran Dec 16 '16 at 12:58
0

I have modified the LruCache class, from the Android framework, to do that.

Here is the full code.

import java.util.LinkedHashMap;
import java.util.Map;

public class RemoveOldHashMap<K, V> {
    private final LinkedHashMap<K, V> map;
    /** Size of this cache in units. Not necessarily the number of elements. */
    private int size;
    private int maxSize;
    private int putCount;
    private int createCount;
    private int evictionCount;
    private int hitCount;
    private int missCount;
    /**
     * @param maxSize for caches that do not override {@link #sizeOf}, this is
     *     the maximum number of entries in the cache. For all other caches,
     *     this is the maximum sum of the sizes of the entries in this cache.
     */
    public RemoveOldHashMap(int maxSize) {
        if (maxSize <= 0) {
            throw new IllegalArgumentException("maxSize <= 0");
        }
        this.maxSize = maxSize;
        this.map = new LinkedHashMap<K, V>(0, 0.75f, false); // false for "interaction order"
    }
    /**
     * Returns the value for {@code key} if it exists in the cache or can be
     * created by {@code #create}. If a value was returned, it is moved to the
     * head of the queue. This returns null if a value is not cached and cannot
     * be created.
     */
    public synchronized final V get(K key) {
        if (key == null) {
            throw new NullPointerException("key == null");
        }

        for (K k : map.keySet()) {
            System.out.println("k = " + k);
        }

        V result = map.get(key);

        for (K k : map.keySet()) {
            System.out.println("k = " + k);
        }

        if (result != null) {
            hitCount++;
            return result;
        }
        missCount++;
        // TODO: release the lock while calling this potentially slow user code
        result = create(key);
        if (result != null) {
            createCount++;
            size += safeSizeOf(key, result);
            map.put(key, result);
            trimToSize(maxSize);
        }
        return result;
    }
    /**
     * Caches {@code value} for {@code key}. The value is moved to the head of
     * the queue.
     *
     * @return the previous value mapped by {@code key}. Although that entry is
     *     no longer cached, it has not been passed to {@link #entryEvicted}.
     */
    public synchronized final V put(K key, V value) {
        if (key == null || value == null) {
            throw new NullPointerException("key == null || value == null");
        }
        putCount++;
        size += safeSizeOf(key, value);
        V previous = map.put(key, value);
        if (previous != null) {
            size -= safeSizeOf(key, previous);
        }
        trimToSize(maxSize);
        return previous;
    }
    private void trimToSize(int maxSize) {
        while (size > maxSize && !map.isEmpty()) {
            Map.Entry<K, V> toEvict = map.entrySet().iterator().next();
            if (toEvict == null) {
                break; // map is empty; if size is not 0 then throw an error below
            }
            K key = toEvict.getKey();
            V value = toEvict.getValue();
            map.remove(key);
            size -= safeSizeOf(key, value);
            evictionCount++;
            // TODO: release the lock while calling this potentially slow user code
            entryEvicted(key, value);
        }
        if (size < 0 || (map.isEmpty() && size != 0)) {
            throw new IllegalStateException(getClass().getName()
                    + ".sizeOf() is reporting inconsistent results!");
        }
    }
    /**
     * Removes the entry for {@code key} if it exists.
     *
     * @return the previous value mapped by {@code key}. Although that entry is
     *     no longer cached, it has not been passed to {@link #entryEvicted}.
     */
    public synchronized final V remove(K key) {
        if (key == null) {
            throw new NullPointerException("key == null");
        }
        V previous = map.remove(key);
        if (previous != null) {
            size -= safeSizeOf(key, previous);
        }
        return previous;
    }
    /**
     * Called for entries that have reached the tail of the least recently used
     * queue and are be removed. The default implementation does nothing.
     */
    protected void entryEvicted(K key, V value) {}
    /**
     * Called after a cache miss to compute a value for the corresponding key.
     * Returns the computed value or null if no value can be computed. The
     * default implementation returns null.
     */
    protected V create(K key) {
        return null;
    }
    private int safeSizeOf(K key, V value) {
        int result = sizeOf(key, value);
        if (result < 0) {
            throw new IllegalStateException("Negative size: " + key + "=" + value);
        }
        return result;
    }
    /**
     * Returns the size of the entry for {@code key} and {@code value} in
     * user-defined units.  The default implementation returns 1 so that size
     * is the number of entries and max size is the maximum number of entries.
     *
     * <p>An entry's size must not change while it is in the cache.
     */
    protected int sizeOf(K key, V value) {
        return 1;
    }
    /**
     * Clear the cache, calling {@link #entryEvicted} on each removed entry.
     */
    public synchronized final void evictAll() {
        trimToSize(-1); // -1 will evict 0-sized elements
    }
    /**
     * For caches that do not override {@link #sizeOf}, this returns the number
     * of entries in the cache. For all other caches, this returns the sum of
     * the sizes of the entries in this cache.
     */
    public synchronized final int size() {
        return size;
    }
    /**
     * For caches that do not override {@link #sizeOf}, this returns the maximum
     * number of entries in the cache. For all other caches, this returns the
     * maximum sum of the sizes of the entries in this cache.
     */
    public synchronized final int maxSize() {
        return maxSize;
    }
    /**
     * Returns the number of times {@link #get} returned a value.
     */
    public synchronized final int hitCount() {
        return hitCount;
    }
    /**
     * Returns the number of times {@link #get} returned null or required a new
     * value to be created.
     */
    public synchronized final int missCount() {
        return missCount;
    }
    /**
     * Returns the number of times {@link #create(Object)} returned a value.
     */
    public synchronized final int createCount() {
        return createCount;
    }
    /**
     * Returns the number of times {@link #put} was called.
     */
    public synchronized final int putCount() {
        return putCount;
    }
    /**
     * Returns the number of values that have been evicted.
     */
    public synchronized final int evictionCount() {
        return evictionCount;
    }
    /**
     * Returns a copy of the current contents of the cache, ordered from least
     * recently accessed to most recently accessed.
     */
    public synchronized final Map<K, V> snapshot() {
        return new LinkedHashMap<K, V>(map);
    }
    @Override public synchronized final String toString() {
        int accesses = hitCount + missCount;
        int hitPercent = accesses != 0 ? (100 * hitCount / accesses) : 0;
        return String.format("LruCache[maxSize=%d,hits=%d,misses=%d,hitRate=%d%%]",
                maxSize, hitCount, missCount, hitPercent);
    }
}

How to use In my example, I will map String objects in Integer keys. The limit is 2 objects, but you should change to achieve your goal.

RemoveOldHashMap<Integer, String> hash = new RemoveOldHashMap<Integer, String>(2 /* here is max value that the internal counter reaches */ ) {
    // Override to tell how your object is measured
    @Override
    protected int sizeOf(Integer key, String value) {
        return 1; // the size of your object
    }
};

Reference: LruCache

D.Kastier
  • 2,640
  • 3
  • 25
  • 40