How hashmap provides constant-time performance?

Question

This might seem like a question, asked a million times before. But I have some doubt for a long time, and couldnt get a proper answer yet.

Lets assume I have a hashmap with a 1100 elements in it. I assume that there are 1000 buckets in the map.

So when I insert a new element, it first derive the hash of the key, say its 676, now it will check where 676 bucket is, and put the object as an EntryObject inside the bucket.

Now my question is how it gets to the 676 bucket? I assume these bucket hashs are indexed, I mean ordered. Like I have a book of 1000 pages, and I want to go to page 676, I cant open the page directly, I can reach to a page which is near to 676, based on asumption of the width of the book, and with a few more attempts, I can go to page 676. Whether the book has 100 pages, or 1000000 pages, doesnt make much difference like 1:10000 , But I have to have few trials, before reaching to the exact page.

My question is, how it happens in HashMap? Also if any of u give me some lead to understand the internal working, in depth, it will be very helpful.

Thanks

Your book analogy works - except that computers are much more accurate than people. Measure the thickness of the book, measure the thickness of a page, then simply multiply them together - you will always be on the right page — Boris the Spider, Jul 08 '18 at 07:49
Thanks a lot for your answer. I have another doubt, so when my Hashmap is growing over the load factor, it will rehash and allocate new space in memory for the array. So if my Hashmap is too big, say a million of buckets, it has to find a continuous empty space for this array, which may be pretty big. Is my understanding correct ? Also what is the thickness of the page? I mean how many bytes? is it fixed because its only the buckets, so it shouldnt be dependent of the type of the key ... — Koushik Paul, Jul 08 '18 at 08:25
Millon buckets? Meh - not particularly large. But yes, a contiguous chunk of memory would be needed to allocate the new backing array. The size of an item in the array would be the size of a reference - depending on your system this might be `64` bits or `32` (32 bit system, compressed pointers, etc) — Boris the Spider, Jul 08 '18 at 09:30
:@Andy Turner this might be a duplicate question, but it aint similar with the one,you tagged, His question revolves inside a bucket, I am interested in how buckets are created, how much space they occupy during rehash. — Koushik Paul, Jul 09 '18 at 13:09

score 6 · Answer 1 · answered Jul 08 '18 at 07:35

6

It is an array lookup. You don't flip through pages when you resolve someArray[index], you add the size of one element times the index to the address of the first entry and there you are.

answered Jul 08 '18 at 07:35

ewramner

5,810
2
17
33

Thanks for your answer , Do u have any link, where I can find in depth explanation, of how each of the collection classes work internally in Java? Regards Koushik – Koushik Paul Jul 08 '18 at 08:10
1

Sure, get it from the source: https://docs.oracle.com/javase/8/docs/technotes/guides/collections/index.html. Start with the overview. If you want depth the JDK ships with full source code, just open the classes in your favorite editor/IDE. – ewramner Jul 08 '18 at 16:06

score -2 · Answer 2 · answered Jul 08 '18 at 16:07

If you have your JRE/JDK configured properly in your IDE, you should be able to click into any collection to view the source code (In Eclipse, Command/Ctrl + Left Click on the Object's constructor).

I've uploaded the HashMap source code for you to take a look at the specific implementation (Edit - I was required to add in some of the code from PasteBin, but could only include ~500 lines. The full implementation is available via the link):

Regarding your question about contiguous memory space, the implementation states that the HashMap class is backed by a combination of Bins and TreeBin-TreeNodes data structure. Memory-wise, this data structure would end up being non-contiguous, with each node/bin containing a pointer to adjacent nodes/bins, and allows a resize to execute into various addresses across memory (my understanding at least).

    * Copyright (c) 1997, 2017, Oracle and/or its affiliates. All rights reserved.
     * ORACLE PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     *
     */

    package java.util;

    import java.io.IOException;
    import java.io.InvalidObjectException;
    import java.io.Serializable;
    import java.lang.reflect.ParameterizedType;
    import java.lang.reflect.Type;
    import java.util.function.BiConsumer;
    import java.util.function.BiFunction;
    import java.util.function.Consumer;
    import java.util.function.Function;
    import jdk.internal.misc.SharedSecrets;

    /**
     * Hash table based implementation of the {@code Map} interface.  This
     * implementation provides all of the optional map operations, and permits
     * {@code null} values and the {@code null} key.  (The {@code HashMap}
     * class is roughly equivalent to {@code Hashtable}, except that it is
     * unsynchronized and permits nulls.)  This class makes no guarantees as to
     * the order of the map; in particular, it does not guarantee that the order
     * will remain constant over time.
     *
     * <p>This implementation provides constant-time performance for the basic
     * operations ({@code get} and {@code put}), assuming the hash function
     * disperses the elements properly among the buckets.  Iteration over
     * collection views requires time proportional to the "capacity" of the
     * {@code HashMap} instance (the number of buckets) plus its size (the number
     * of key-value mappings).  Thus, it's very important not to set the initial
     * capacity too high (or the load factor too low) if iteration performance is
     * important.
     *
     * <p>An instance of {@code HashMap} has two parameters that affect its
     * performance: <i>initial capacity</i> and <i>load factor</i>.  The
     * <i>capacity</i> is the number of buckets in the hash table, and the initial
     * capacity is simply the capacity at the time the hash table is created.  The
     * <i>load factor</i> is a measure of how full the hash table is allowed to
     * get before its capacity is automatically increased.  When the number of
     * entries in the hash table exceeds the product of the load factor and the
     * current capacity, the hash table is <i>rehashed</i> (that is, internal data
     * structures are rebuilt) so that the hash table has approximately twice the
     * number of buckets.
     *
     * <p>As a general rule, the default load factor (.75) offers a good
     * tradeoff between time and space costs.  Higher values decrease the
     * space overhead but increase the lookup cost (reflected in most of
     * the operations of the {@code HashMap} class, including
     * {@code get} and {@code put}).  The expected number of entries in
     * the map and its load factor should be taken into account when
     * setting its initial capacity, so as to minimize the number of
     * rehash operations.  If the initial capacity is greater than the
     * maximum number of entries divided by the load factor, no rehash
     * operations will ever occur.
     *
     * <p>If many mappings are to be stored in a {@code HashMap}
     * instance, creating it with a sufficiently large capacity will allow
     * the mappings to be stored more efficiently than letting it perform
     * automatic rehashing as needed to grow the table.  Note that using
     * many keys with the same {@code hashCode()} is a sure way to slow
     * down performance of any hash table. To ameliorate impact, when keys
     * are {@link Comparable}, this class may use comparison order among
     * keys to help break ties.
     *
     * <p><strong>Note that this implementation is not synchronized.</strong>
     * If multiple threads access a hash map concurrently, and at least one of
     * the threads modifies the map structurally, it <i>must</i> be
     * synchronized externally.  (A structural modification is any operation
     * that adds or deletes one or more mappings; merely changing the value
     * associated with a key that an instance already contains is not a
     * structural modification.)  This is typically accomplished by
     * synchronizing on some object that naturally encapsulates the map.
     *
     * If no such object exists, the map should be "wrapped" using the
     * {@link Collections#synchronizedMap Collections.synchronizedMap}
     * method.  This is best done at creation time, to prevent accidental
     * unsynchronized access to the map:<pre>
     *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>
     *
     * <p>The iterators returned by all of this class's "collection view methods"
     * are <i>fail-fast</i>: if the map is structurally modified at any time after
     * the iterator is created, in any way except through the iterator's own
     * {@code remove} method, the iterator will throw a
     * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
     * modification, the iterator fails quickly and cleanly, rather than risking
     * arbitrary, non-deterministic behavior at an undetermined time in the
     * future.
     *
     * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
     * as it is, generally speaking, impossible to make any hard guarantees in the
     * presence of unsynchronized concurrent modification.  Fail-fast iterators
     * throw {@code ConcurrentModificationException} on a best-effort basis.
     * Therefore, it would be wrong to write a program that depended on this
     * exception for its correctness: <i>the fail-fast behavior of iterators
     * should be used only to detect bugs.</i>
     *
     * <p>This class is a member of the
     * <a href="{@docRoot}/java/util/package-summary.html#CollectionsFramework">
     * Java Collections Framework</a>.
     *
     * @param <K> the type of keys maintained by this map
     * @param <V> the type of mapped values
     *
     * @author  Doug Lea
     * @author  Josh Bloch
     * @author  Arthur van Hoff
     * @author  Neal Gafter
     * @see     Object#hashCode()
     * @see     Collection
     * @see     Map
     * @see     TreeMap
     * @see     Hashtable
     * @since   1.2
     */
    public class HashMap<K,V> extends AbstractMap<K,V>
        implements Map<K,V>, Cloneable, Serializable {

        private static final long serialVersionUID = 362498820763181265L;

        /*
         * Implementation notes.
         *
         * This map usually acts as a binned (bucketed) hash table, but
         * when bins get too large, they are transformed into bins of
         * TreeNodes, each structured similarly to those in
         * java.util.TreeMap. Most methods try to use normal bins, but
         * relay to TreeNode methods when applicable (simply by checking
         * instanceof a node).  Bins of TreeNodes may be traversed and
         * used like any others, but additionally support faster lookup
         * when overpopulated. However, since the vast majority of bins in
         * normal use are not overpopulated, checking for existence of
         * tree bins may be delayed in the course of table methods.
         *
         * Tree bins (i.e., bins whose elements are all TreeNodes) are
         * ordered primarily by hashCode, but in the case of ties, if two
         * elements are of the same "class C implements Comparable<C>",
         * type then their compareTo method is used for ordering. (We
         * conservatively check generic types via reflection to validate
         * this -- see method comparableClassFor).  The added complexity
         * of tree bins is worthwhile in providing worst-case O(log n)
         * operations when keys either have distinct hashes or are
         * orderable, Thus, performance degrades gracefully under
         * accidental or malicious usages in which hashCode() methods
         * return values that are poorly distributed, as well as those in
         * which many keys share a hashCode, so long as they are also
         * Comparable. (If neither of these apply, we may waste about a
         * factor of two in time and space compared to taking no
         * precautions. But the only known cases stem from poor user
         * programming practices that are already so slow that this makes
         * little difference.)
         *
         * Because TreeNodes are about twice the size of regular nodes, we
         * use them only when bins contain enough nodes to warrant use
         * (see TREEIFY_THRESHOLD). And when they become too small (due to
         * removal or resizing) they are converted back to plain bins.  In
         * usages with well-distributed user hashCodes, tree bins are
         * rarely used.  Ideally, under random hashCodes, the frequency of
         * nodes in bins follows a Poisson distribution
         * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
         * parameter of about 0.5 on average for the default resizing
         * threshold of 0.75, although with a large variance because of
         * resizing granularity. Ignoring variance, the expected
         * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
         * factorial(k)). The first values are:
         *
         * 0:    0.60653066
         * 1:    0.30326533
         * 2:    0.07581633
         * 3:    0.01263606
         * 4:    0.00157952
         * 5:    0.00015795
         * 6:    0.00001316
         * 7:    0.00000094
         * 8:    0.00000006
         * more: less than 1 in ten million
         *
         * The root of a tree bin is normally its first node.  However,
         * sometimes (currently only upon Iterator.remove), the root might
         * be elsewhere, but can be recovered following parent links
         * (method TreeNode.root()).
         *
         * All applicable internal methods accept a hash code as an
         * argument (as normally supplied from a public method), allowing
         * them to call each other without recomputing user hashCodes.
         * Most internal methods also accept a "tab" argument, that is
         * normally the current table, but may be a new or old one when
         * resizing or converting.
         *
         * When bin lists are treeified, split, or untreeified, we keep
         * them in the same relative access/traversal order (i.e., field
         * Node.next) to better preserve locality, and to slightly
         * simplify handling of splits and traversals that invoke
         * iterator.remove. When using comparators on insertion, to keep a
         * total ordering (or as close as is required here) across
         * rebalancings, we compare classes and identityHashCodes as
         * tie-breakers.
         *
         * The use and transitions among plain vs tree modes is
         * complicated by the existence of subclass LinkedHashMap. See
         * below for hook methods defined to be invoked upon insertion,
         * removal and access that allow LinkedHashMap internals to
         * otherwise remain independent of these mechanics. (This also
         * requires that a map instance be passed to some utility methods
         * that may create new nodes.)
         *
         * The concurrent-programming-like SSA-based coding style helps
         * avoid aliasing errors amid all of the twisty pointer operations.
         */

        /**
         * The default initial capacity - MUST be a power of two.
         */
        static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

        /**
         * The maximum capacity, used if a higher value is implicitly specified
         * by either of the constructors with arguments.
         * MUST be a power of two <= 1<<30.
         */
        static final int MAXIMUM_CAPACITY = 1 << 30;

        /**
         * The load factor used when none specified in constructor.
         */
        static final float DEFAULT_LOAD_FACTOR = 0.75f;

        /**
         * The bin count threshold for using a tree rather than list for a
         * bin.  Bins are converted to trees when adding an element to a
         * bin with at least this many nodes. The value must be greater
         * than 2 and should be at least 8 to mesh with assumptions in
         * tree removal about conversion back to plain bins upon
         * shrinkage.
         */
        static final int TREEIFY_THRESHOLD = 8;

        /**
         * The bin count threshold for untreeifying a (split) bin during a
         * resize operation. Should be less than TREEIFY_THRESHOLD, and at
         * most 6 to mesh with shrinkage detection under removal.
         */
        static final int UNTREEIFY_THRESHOLD = 6;

        /**
         * The smallest table capacity for which bins may be treeified.
         * (Otherwise the table is resized if too many nodes in a bin.)
         * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
         * between resizing and treeification thresholds.
         */
        static final int MIN_TREEIFY_CAPACITY = 64;

        /**
         * Basic hash bin node, used for most entries.  (See below for
         * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
         */
        static class Node<K,V> implements Map.Entry<K,V> {
            final int hash;
            final K key;
            V value;
            Node<K,V> next;

            Node(int hash, K key, V value, Node<K,V> next) {
                this.hash = hash;
                this.key = key;
                this.value = value;
                this.next = next;
            }

            public final K getKey()        { return key; }
            public final V getValue()      { return value; }
            public final String toString() { return key + "=" + value; }

            public final int hashCode() {
                return Objects.hashCode(key) ^ Objects.hashCode(value);
            }

            public final V setValue(V newValue) {
                V oldValue = value;
                value = newValue;
                return oldValue;
            }

            public final boolean equals(Object o) {
                if (o == this)
                    return true;
                if (o instanceof Map.Entry) {
                    Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                    if (Objects.equals(key, e.getKey()) &&
                        Objects.equals(value, e.getValue()))
                        return true;
                }
                return false;
            }
        }

        /* ---------------- Static utilities -------------- */

        /**
         * Computes key.hashCode() and spreads (XORs) higher bits of hash
         * to lower.  Because the table uses power-of-two masking, sets of
         * hashes that vary only in bits above the current mask will
         * always collide. (Among known examples are sets of Float keys
         * holding consecutive whole numbers in small tables.)  So we
         * apply a transform that spreads the impact of higher bits
         * downward. There is a tradeoff between speed, utility, and
         * quality of bit-spreading. Because many common sets of hashes
         * are already reasonably distributed (so don't benefit from
         * spreading), and because we use trees to handle large sets of
         * collisions in bins, we just XOR some shifted bits in the
         * cheapest possible way to reduce systematic lossage, as well as
         * to incorporate impact of the highest bits that would otherwise
         * never be used in index calculations because of table bounds.
         */
        static final int hash(Object key) {
            int h;
            return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
        }

        /**
         * Returns x's Class if it is of the form "class C implements
         * Comparable<C>", else null.
         */
        static Class<?> comparableClassFor(Object x) {
            if (x instanceof Comparable) {
                Class<?> c; Type[] ts, as; ParameterizedType p;
                if ((c = x.getClass()) == String.class) // bypass checks
                    return c;
                if ((ts = c.getGenericInterfaces()) != null) {
                    for (Type t : ts) {
                        if ((t instanceof ParameterizedType) &&
                            ((p = (ParameterizedType) t).getRawType() ==
                             Comparable.class) &&
                            (as = p.getActualTypeArguments()) != null &&
                            as.length == 1 && as[0] == c) // type arg is c
                            return c;
                    }
                }
            }
            return null;
        }

        /**
         * Returns k.compareTo(x) if x matches kc (k's screened comparable
         * class), else 0.
         */
        @SuppressWarnings({"rawtypes","unchecked"}) // for cast to Comparable
        static int compareComparables(Class<?> kc, Object k, Object x) {
            return (x == null || x.getClass() != kc ? 0 :
                    ((Comparable)k).compareTo(x));
        }

        /**
         * Returns a power of two size for the given target capacity.
         */
        static final int tableSizeFor(int cap) {
            int n = cap - 1;
            n |= n >>> 1;
            n |= n >>> 2;
            n |= n >>> 4;
            n |= n >>> 8;
            n |= n >>> 16;
            return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
        }

        /* ---------------- Fields -------------- */

        /**
         * The table, initialized on first use, and resized as
         * necessary. When allocated, length is always a power of two.
         * (We also tolerate length zero in some operations to allow
         * bootstrapping mechanics that are currently not needed.)
         */
        transient Node<K,V>[] table;

        /**
         * Holds cached entrySet(). Note that AbstractMap fields are used
         * for keySet() and values().
         */
        transient Set<Map.Entry<K,V>> entrySet;

        /**
         * The number of key-value mappings contained in this map.
         */
        transient int size;

        /**
         * The number of times this HashMap has been structurally modified
         * Structural modifications are those that change the number of mappings in
         * the HashMap or otherwise modify its internal structure (e.g.,
         * rehash).  This field is used to make iterators on Collection-views of
         * the HashMap fail-fast.  (See ConcurrentModificationException).
         */
        transient int modCount;

        /**
         * The next size value at which to resize (capacity * load factor).
         *
         * @serial
         */
        // (The javadoc description is true upon serialization.
        // Additionally, if the table array has not been allocated, this
        // field holds the initial array capacity, or zero signifying
        // DEFAULT_INITIAL_CAPACITY.)
        int threshold;

        /**
         * The load factor for the hash table.
         *
         * @serial
         */
        final float loadFactor;

        /* ---------------- Public operations -------------- */

        /**
         * Constructs an empty {@code HashMap} with the specified initial
         * capacity and load factor.
         *
         * @param  initialCapacity the initial capacity
         * @param  loadFactor      the load factor
         * @throws IllegalArgumentException if the initial capacity is negative
         *         or the load factor is nonpositive
         */
        public HashMap(int initialCapacity, float loadFactor) {
            if (initialCapacity < 0)
                throw new IllegalArgumentException("Illegal initial capacity: " +
                                                   initialCapacity);
            if (initialCapacity > MAXIMUM_CAPACITY)
                initialCapacity = MAXIMUM_CAPACITY;
            if (loadFactor <= 0 || Float.isNaN(loadFactor))
                throw new IllegalArgumentException("Illegal load factor: " +
                                                   loadFactor);
            this.loadFactor = loadFactor;
            this.threshold = tableSizeFor(initialCapacity);
        }

        /**
         * Constructs an empty {@code HashMap} with the specified initial
         * capacity and the default load factor (0.75).
         *
         * @param  initialCapacity the initial capacity.
         * @throws IllegalArgumentException if the initial capacity is negative.
         */
        public HashMap(int initialCapacity) {
            this(initialCapacity, DEFAULT_LOAD_FACTOR);
        }

        /**
         * Constructs an empty {@code HashMap} with the default initial capacity
         * (16) and the default load factor (0.75).
         */
        public HashMap() {
            this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
        }

        /**
         * Constructs a new {@code HashMap} with the same mappings as the
         * specified {@code Map}.  The {@code HashMap} is created with
         * default load factor (0.75) and an initial capacity sufficient to
         * hold the mappings in the specified {@code Map}.
         *
         * @param   m the map whose mappings are to be placed in this map
         * @throws  NullPointerException if the specified map is null
         */
        public HashMap(Map<? extends K, ? extends V> m) {
            this.loadFactor = DEFAULT_LOAD_FACTOR;
            putMapEntries(m, false);
        }

Tree is for inside a bucket, I was looking for the allocation of the buckets itself. — Koushik Paul, Jul 09 '18 at 13:03
Gotcha, that makes sense. Sorry that I was on the wrong track with my response here! — John Stark, Jul 09 '18 at 14:23

How hashmap provides constant-time performance?

2 Answers2