3

I am learning python and I came across a code (*) that uses Hash and UUID for storing data in python.
Below is a simpler excerpt for the topic in question.

Basically, when data is passed to DataBase, it hashes data, creates uuid,
then holds them as ._data: {uuid, data} and ._index: {hash value, uuid}.

What are the benefits of using Hash and UUID from the perspective of DBMS? (as opposed to just simply storing them in python built-in classes, like List and Dictionary)

from typing import Dict
import uuid

class DataBase:
    def __init__(self) -> None:
        self._data: Dict[uuid.UUID, Dict] = {}
        self._index: Dict[int, uuid.UUID] = {}

    def insert(self,
               data: Dict[str, int]) -> None:

        keyhash = hash(tuple(data))
        _id = uuid.uuid4()
        self._data[_id] = data
        self._index[keyhash] = _id

db = DataBase()
db.insert(data={'priceA': 5000})
db.insert(data={'priceB': 4000})

...

(*) The original code is an API client that constantly receives data asynchronously via multiple Websocket streaming, and read, update, insert, delete them in their defined Database.

koyamashinji
  • 535
  • 1
  • 6
  • 19
  • 1
    Wow, that's a very interesting question. Maybe it's about keeping the index smaller? If the data is large, I could imagine a model where the index is in memory but maybe not _data. Mind you, sequential indices would have worked just as well for that – joanis Jun 25 '21 at 15:31
  • 2
    Using UUID gives you good statistical guarantees of uniqueness even if they are generated in different processes or servers, so it's a way to generate unique keys without synchronisation. Maybe that was the motivation? – joanis Jun 25 '21 at 15:32
  • 1
    This question is now closed as 'opinion-based'. While I understand asking for benefits can be somewhat opinion-based I reckon they can be facts as well. I have seen many past questions asking for pros and cons of *something*, that had not been closed. I really wish I could've kept going with this question – koyamashinji Jun 26 '21 at 02:12
  • This question just got reopened. I hope someone has more insight. I just had a thought about a weakness of this code: keyhash has no uniqueness guarantees. If it's well designed, it should yield few collisions, but unlike with UUID, one should never rely on its uniqueness. So, to be safe, `_index` really should map from ìnt` to a collection of UUIDs. – joanis Jun 30 '21 at 13:17
  • To improve this database design, you could replace `hash()` by something with better (statistical, not absolute) uniqueness guarantees (e.g., see [Safest way to generate a unique hash?](https://stackoverflow.com/q/47601592/3216427)), but frankly, by this point, using both a strong hash and UUIDs is getting rather silly. – joanis Jun 30 '21 at 13:22

0 Answers0