0

I am learning python and one of the points that was made is that variable names are stored separately in memory to the objects they refer to; that is that they have some pointer to a different area in memory where the object they refer to is actually stored.

I am now reading about what hash tables are and the (basics) of how they are implemented. The answers to this question were really helpful: How does a hash table work?

My takeaway from that is that if I have a key-value pair, then the hash essentially converts the index into a key, and then that key is stored in an array. So if the index is for 'key1' is 0, then a[0] contains the actual value 'value1'.

However, I am not sure if this is actually the case, or if like variables in python the value in the array doesn't actually represent 'value1', but instead some pointer which points to the location in memory where 'value1' is stored. So does it go 'key1' --> array index --> a[array index] --> value or does it go 'key1' --> array index --> a[array index] --> pointer address --> 'value1' stored in memory location determined by pointer address?

As a followup question, if it is the latter, then does that mean the values in stored in a hash table are actually scattered throughout memory rather than stored sequentially? Thanks very much and sorry if the question is obvious.

masiewpao
  • 309
  • 3
  • 12
  • 1
    I took a look at the C sources of Python: [source](https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Objects/dictobject.c#L749). If you take a look at line 749 it is actually a pointer. However, I don't know if that answers your question. The method you see there is the basic lookup function for Python dictionaries. You can also see, how they determine which value you are looking for with a specific key. – kedenk Nov 03 '18 at 19:29
  • @kedenk Thank you, I don't know any C but this is exactly the answer I was looking for. I just wasn't sure if the array contained values or pointers to values! – masiewpao Nov 03 '18 at 19:33
  • 1
    Glad to help you. I will create an answer for you. Afterwards, we can delete our comments :) – kedenk Nov 03 '18 at 19:46

2 Answers2

1

Warning: this answer might be a bit confusing, because there are two separate things to consider:

  • Python has a built in hash table type, dict. Its internal implementation is written in C (at least for CPython) and uses tricks you can't write directly in Python. For details on the (several) implementations used over the years, see How are Python's Built In Dictionaries Implemented.

  • Python as a language mostly doesn't have arrays.1 Some of the answers to the linked question are expressed in terms of arrays, and Python's built in list type can be used like an array, so that would serve as a model. That's what I will do here.

So let's start by making an empty pseudo-array: []. We'll bind this to some "array-like" name:

a = []

and then proceed with your questions.


1The array module provides C-style arrays. See the array documentation for details.


My takeaway from that is that if I have a key-value pair, then the hash essentially converts the index into a key, and then that key is stored in an array.

It's the other way around: the hash converts the key—which is "too big"—to a smaller, i.e., hash, value that the computer can handle more easily and directly. That converts the key to an index. You got this right in the next part of the question though:

So if the index is for 'key1' is 0, then a[0] contains the actual value 'value1'.

Generally, yes. However, if the hash table is meant to work for both hits and misses, and some other key (say '1frotz') might also convert to index 0, we must store two items in a[0], or keep a parallel array of the actual keys, or something along these lines, to make sure that a[0] is holding 'key1' and not some other key-value pair. That is, in Python we might do this:

i = hash(key) % tablesize   # assuming a fixed table size
assert i < len(a)           # this is going to fail since len(a) == 0!
kv_pair = a[i]
assert kv_pair[0] == key
... use kv_pair[1], which holds the value ...

Of course, we also need to be able to put items into the hash table. Generally, when we do that, we'll expand the table if the key-value pair won't fit, so instead of the above asserts, we do:

def find_value(key):
    if len(a) > 0:
        i = hash(key) % len(a)
        kv_pair = a[i]
        if kv_pair is not None and kv_pair[0] == key:
            return kv_pair[1]
    return None

def insert_kv_pair(key, value):
    if len(a) > 0:
        i = hash(key) % len(a)
        kv_pair = a[i]
        if kv_pair is None:       # not in the table yet
            a[i] = [key, value]   # so put it in
            return
        if kv_pair[0] == key:     # already in the table
            kv_pair[1] = value    # so replace the value
            return
    ... the complicated parts go here ...

When we hit the "complicated parts" either the array itself is too small, or the slot we want to use is already in use for some other key.

Here's where hash tables get fancy. Some use a secondary hash function, doing something called re-hashing and probing other table slots (in which case we want to start out with a nonzero table size). Some expand the table in place. Again, to see what Python actually does, see that other question and its answers.

However, I am not sure if this is actually the case, or if like variables in python the value in the array doesn't actually represent 'value1', but instead some pointer which points to the location in memory where 'value1' is stored. ...

Because Python allows dynamic types in dictionaries, the value slot for any hashed key definitely stores a pointer to the actual value, rather than a copy of the value. Values of different types have different underlying sizes. You can view the size of a type using sys.getsizeof (but see How do I determine the size of an object in Python? as well):

>>> import sys
>>> sys.getsizeof(int)
400
>>> sys.getsizeof(1)
28
>>> sys.getsizeof('str')
52
>>> sys.getsizeof('string')
55

Since the sizes vary all over the map, Python just stores a pointer in the dictionary's "value" slot for the given key.

... does that mean the values in stored in a hash table are actually scattered throughout memory rather than stored sequentially?

Yes. Only the hash value and the pair of key/value pointers are stored sequentially in the internal Python implementation.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Ah I mistyped in that quote, I did mean to say the hash converts the key to an index rather than vice versa. But I will leave it as is so others can make clear the distinction in your answer thank you!! – masiewpao Nov 03 '18 at 22:13
1

If you take a look at the source code of the basic lookup function of Python dictionaries you see, they assign a pointer to the actual value.

In the method, you can also see all the steps of the lookup with a given key. So I think your assumption

'key1' --> array index --> a[array index] --> pointer address --> 'value1'

is the correct one.

kedenk
  • 659
  • 6
  • 9