2

I'm not sure if HashMap or TreeMap store Map.Entry in itself. That is, it's likely to return Map.Entry instance created on the fly when entrySet().iterator().next() is called.

Personally, I think it may be better in this form:

class Entry {
    Object key;
    Object value;
}

interface InplaceIterator {
    boolean next();
}

Entry entryBuf = new Entry();
InplaceIterator it = map.entrySet().inplaceIterator(entryBuf);
while (it.next()) {
    // do with entryBuf...
}

Thus, the creation of Entry is avoided.

I don't know how Java Compiler works, will Java Compiler optimize the creation of Map.Entry away, by analyzing the dataflow and get the knowledge that Map.Entry can be safely reused?

Or, is someone there have already written another collection framework to enable inplace iteration?

Lenik
  • 13,946
  • 17
  • 75
  • 103

3 Answers3

13

What you describe (having an iterator-local Map.Entry object and reusing it for all next() return values) is one possible Map implementation, and I think some special-purpose maps are using this.

For example, the implementation of EnumMap.entrySet().iterator() (here the version from OpenJDK, 1.6.0_20) simply uses the iterator object itself as the Entry object returned by the next() method:

/**
 * Since we don't use Entry objects, we use the Iterator itself as entry.
 */
private class EntryIterator extends EnumMapIterator<Map.Entry<K,V>>
    implements Map.Entry<K,V>
{
    public Map.Entry<K,V> next() {
        if (!hasNext())
            throw new NoSuchElementException();
        lastReturnedIndex = index++;
        return this;
    }

    public K getKey() {
        checkLastReturnedIndexForEntryUse();
        return keyUniverse[lastReturnedIndex];
    }

    public V getValue() {
        checkLastReturnedIndexForEntryUse();
        return unmaskNull(vals[lastReturnedIndex]);
    }

    public V setValue(V value) {
        checkLastReturnedIndexForEntryUse();
        V oldValue = unmaskNull(vals[lastReturnedIndex]);
        vals[lastReturnedIndex] = maskNull(value);
        return oldValue;
    }

    // equals, hashCode, toString

    private void checkLastReturnedIndexForEntryUse() {
        if (lastReturnedIndex < 0)
            throw new IllegalStateException("Entry was removed");
    }
}

This is possible, since the Map.Entry specification states (emphasis by me):

A map entry (key-value pair). The Map.entrySet method returns a collection-view of the map, whose elements are of this class. The only way to obtain a reference to a map entry is from the iterator of this collection-view. These Map.Entry objects are valid only for the duration of the iteration; more formally, the behavior of a map entry is undefined if the backing map has been modified after the entry was returned by the iterator, except through the setValue operation on the map entry.

If you want all entries at once, you'll have to use map.entrySet().toArray(), which may create immutable copies of the entries.


Here some more observations about the default maps (all in OpenJDK 1.6.0_20 as found in Ubuntu's openjdk6-source package):

  • The general purpose maps HashMap and TreeMap (as well as the legacy Hashtable) are already using some kind of Entry objects as part of their internal structure (the table or tree), so they simple let these objects implement Map.Entry and return them. They are not created on the fly by the Iterator.

    The same is valid for WeakHashMap (where having an Entry object in a strong reference does not avoid its key to get garbage-collected, if I understand right - but as long as you don't call next() on the iterator, the iterator holds the key in the current entry).

  • IdentityHashMap is internally using a simple Object[], with alternating key and value, so no entry objects here, too, and thus also a reusing of the iterator as entry.

  • ConcurrentSkipListMap is using Node objects which do not implement anything, so its iterators return new AbstractMap.SimpleImmutableEntry<K,V>(n.key, v);. This implies you can't use their setValue() method, as explained in the class documentation:

    All Map.Entry pairs returned by methods in this class and its views represent snapshots of mappings at the time they were produced. They do not support the Entry.setValue method. (Note however that it is possible to change mappings in the associated map using put, putIfAbsent, or replace, depending on exactly which effect you need.)

  • ConcurrentHashMap internally uses a HashEntry class analogously to the HashMap, but this does not implement anything. Additionally, there is an internal class WriteThroughEntry (extending AbstractMap.SimpleEntry), whose setValue() method delegates to the put method of the map. The iterator returns new objects of this WriteThroughEntry class.

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
  • 1
    I'm not saying you're wrong, but having a reused mutable Entry could be *extremely* surprising to the coder in a way that is not immediately obvious. Imagine you're trying to find four different elements in a map - you couldn't without copying each returned Entry since they'd really be the same! – Steven Schlansker Mar 28 '11 at 07:50
  • 1
    Hm, an interesting quote you've pulled out. I'm skeptical since it seems to be somewhat ambiguously worded - is "the duration of the iteration" the duration of a *single* pass from one iterator.next() call to the next? Or is it the entire time that the iterator is live? (i.e. the time for which the implicit iterator is live). Also the "more formally" clause seems to describe entirely different semantics. How helpful ;-p – Steven Schlansker Mar 28 '11 at 07:57
  • 1
    @Steven: The same question I want to ask. I've tested on `EnumMap`, `HashMap`, `IdentityHashMap`, `TreeMap`, `Hashtable` using the same operator (==) on the returned entries, and the result is: true, false, true, false, false. – Lenik Mar 28 '11 at 08:06
  • @Steven: I'm not really sure how to interpret the quote, but the `EnumMap` source quoted above at least shows that someone at Sun understood it like I wrote. – Paŭlo Ebermann Mar 28 '11 at 08:13
  • @谢继雷: Thanks for your experiment ... I added a bit of implementation insight for the default Map implementations. – Paŭlo Ebermann Mar 28 '11 at 09:50
  • Looking at the EnumMap implementation, that is in fact what they do. That's confusing and IMO bad design, but it is what it is and I guess we just have to live with it. So don't take Map.Entries out of Iterators, or potentially be very sorry! +1 for both of you for finding that in the docs and then actually testing it :-) – Steven Schlansker Mar 28 '11 at 20:16
  • Looks like this bug/optimization (re-use iterator as a map entry) was in the middle of being fixed while this discussion was taking place: see the dates on the comments for this bug: https://bugs.openjdk.java.net/browse/JDK-6312706 – Mark VY Apr 03 '22 at 02:44
1

Usually, small, short lived objects are almost free. Consider f1 and f2

static Entry f1(int i){ return new Entry(i); }

static Entry entry = new Entry(0);
static Entry f2(int i){ entry.i=i; return entry; }

static class Entry
{
    Entry(int i){ this.i=i; }
    int i;
    int get(){ return i; }
}

This is a realistic test case of the problem you described - reusing the same object per iteration, vs. creating a new object per iteration. In both cases, some data is saved in the object, carried over to the call site to be read.

Let's profile it, retrieve a billion entries, and read data stored in each, in 3 different ways

    int r = 0;
    for(int i=0; i<1000000000; i++)
    {
    test0:  r += i;
    test1:  r += f1(i).get();
    test2:  r += f2(i).get();
    } 
    print(r);

The number I got is, test2 is as fast as test0; test1 is slower than test2 by only one cpu cycle per iteration. ( I guess the difference is several machine instructions, and CPU pipelines them in one cycle)

If you still don't believe it, implement fully your proposed "efficient" solution, compare it to the presumably "wasteful" implementation, and see the difference for yourself. You will be amazed.

irreputable
  • 44,725
  • 9
  • 65
  • 93
  • I tried it, I'm really confused that f1 is faster then f2 under ubuntu, sun-jdk-1.7. Well, I'm using eclipse, maybe f1 is optimized by eclipse java compiler? – Lenik Mar 29 '11 at 02:23
  • it's difficult to reason about VM optimization. see also http://stackoverflow.com/questions/4970015/java-calling-static-methods-vs-manual-inlining-performance-overhead/4972623#4972623 – irreputable Mar 29 '11 at 02:52
  • @irreputable: The cost of the *allocation* of small object is almost insignificant, just as you conclude. I think the only operation required for allocations is to increment a pointer with the object size to point to the next free memory slot. But allocation induce more *garbage collection* in the future, which might be more expensive. – Lii Sep 07 '15 at 08:53
0

Google Collection's ArrayListMultimap is fairly efficient and isn't resource intensive, http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/ArrayListMultimap.html

Creating a Multimap

private Multimap<Integer, String> store = ArrayListMultimap.create();

Iterating the Multimap

for (Map.Entry<Integer, String> entry: store.entries()) {}

And if you'd rather avoid Map.Entry, then extract the keyset and go from there:

List<Integer> keys = new ArrayList<Integer>(store.keySet());
for(Long key : keys){
     ArrayList<String> stored_strings = new ArrayList<String>(store.get(key));
}
  • 3
    Iterating over the key set calling get() on every key is likely false economy -- even if you are creating new Entry instances with entrySet(), the overhead of lookup on get() is almost certainly far greater. You'll do nothing but shoot yourself in the foot. – Steven Schlansker Mar 28 '11 at 07:44
  • Also, switching to a Multimap doen't really answer the question. You'd rarely use a Map where a Multimap is more appropriate or vice versa - generally you care whether multiple values per key are allowed or not. – Steven Schlansker Mar 28 '11 at 07:48
  • @Steven: The get() is on the variable store, the Multimap, so the lookup time is O(1). The memory allocated for Map.Entry instances is actually more than the memory allocated for ArrayList instances. I think I'm misunderstanding you, though. – Black Box Operations Mar 28 '11 at 07:52
  • @Steven (comment 2): Sure, I don't necessarily disagree with you there, I was more or less offering an example using a Multimap to show 1) a different data structure and 2) a different way to iterate. – Black Box Operations Mar 28 '11 at 07:55
  • 2
    @Black Box Operations: You are correct that its complexity is O(1) however the things you are comparing are 1) returning a (potentially) new Map.Entry for every iteration, or 2) running a hash lookup (or worse) every iteration. At the very least a hash lookup involves a hashCode() computation and an equals() check; I would be very surprised if this is cheaper than creating the Map.Entry. And if you happen to be using a TreeMap suddenly you are on a different complexity and everything goes down the tubes. entrySet exists because it's more efficient and convenient, don't fear to use it :-) – Steven Schlansker Mar 28 '11 at 07:58
  • 2
    @Steven: Yep, just profiled a small test case, you're right. the hashCode() and the equals() really kill it. Good catch man. – Black Box Operations Mar 28 '11 at 08:00
  • I would +1 this for testing it out, but apparently I can't because I voted here in the last 12 hours?! Bummer. – Steven Schlansker Mar 28 '11 at 20:17