Warning: this answer might be a bit confusing, because there are two separate things to consider:
Python has a built in hash table type, dict
. Its internal implementation is written in C (at least for CPython) and uses tricks you can't write directly in Python. For details on the (several) implementations used over the years, see How are Python's Built In Dictionaries Implemented.
Python as a language mostly doesn't have arrays.1 Some of the answers to the linked question are expressed in terms of arrays, and Python's built in list type can be used like an array, so that would serve as a model. That's what I will do here.
So let's start by making an empty pseudo-array: []
. We'll bind this to some "array-like" name:
a = []
and then proceed with your questions.
1The array
module provides C-style arrays. See the array documentation for details.
My takeaway from that is that if I have a key-value pair, then the hash essentially converts the index into a key, and then that key is stored in an array.
It's the other way around: the hash converts the key—which is "too big"—to a smaller, i.e., hash, value that the computer can handle more easily and directly. That converts the key to an index. You got this right in the next part of the question though:
So if the index is for 'key1'
is 0, then a[0]
contains the actual value 'value1'
.
Generally, yes. However, if the hash table is meant to work for both hits and misses, and some other key (say '1frotz'
) might also convert to index 0, we must store two items in a[0]
, or keep a parallel array of the actual keys, or something along these lines, to make sure that a[0]
is holding 'key1'
and not some other key-value pair. That is, in Python we might do this:
i = hash(key) % tablesize # assuming a fixed table size
assert i < len(a) # this is going to fail since len(a) == 0!
kv_pair = a[i]
assert kv_pair[0] == key
... use kv_pair[1], which holds the value ...
Of course, we also need to be able to put items into the hash table. Generally, when we do that, we'll expand the table if the key-value pair won't fit, so instead of the above assert
s, we do:
def find_value(key):
if len(a) > 0:
i = hash(key) % len(a)
kv_pair = a[i]
if kv_pair is not None and kv_pair[0] == key:
return kv_pair[1]
return None
def insert_kv_pair(key, value):
if len(a) > 0:
i = hash(key) % len(a)
kv_pair = a[i]
if kv_pair is None: # not in the table yet
a[i] = [key, value] # so put it in
return
if kv_pair[0] == key: # already in the table
kv_pair[1] = value # so replace the value
return
... the complicated parts go here ...
When we hit the "complicated parts" either the array itself is too small, or the slot we want to use is already in use for some other key.
Here's where hash tables get fancy. Some use a secondary hash function, doing something called re-hashing and probing other table slots (in which case we want to start out with a nonzero table size). Some expand the table in place. Again, to see what Python actually does, see that other question and its answers.
However, I am not sure if this is actually the case, or if like variables in python the value in the array doesn't actually represent 'value1', but instead some pointer which points to the location in memory where 'value1' is stored. ...
Because Python allows dynamic types in dictionaries, the value slot for any hashed key definitely stores a pointer to the actual value, rather than a copy of the value. Values of different types have different underlying sizes. You can view the size of a type using sys.getsizeof
(but see How do I determine the size of an object in Python? as well):
>>> import sys
>>> sys.getsizeof(int)
400
>>> sys.getsizeof(1)
28
>>> sys.getsizeof('str')
52
>>> sys.getsizeof('string')
55
Since the sizes vary all over the map, Python just stores a pointer in the dictionary's "value" slot for the given key.
... does that mean the values in stored in a hash table are actually scattered throughout memory rather than stored sequentially?
Yes. Only the hash value and the pair of key/value pointers are stored sequentially in the internal Python implementation.