0

I am trying to write a key value store in python using HDF5 as a backend. Ideally I would like the store to behave as a python dictionary. My current code looks like this now

from multimethod import multimethod
from typing import List
import h5py

class FancyDic(MutableMapping):

def __init__(self, fname):
    self.fname = fname
    self.__store__ = h5py.File(fname,'w')
    

def get_dic(self, df):
    data_dict = dict()
    for col in df.columns:
        if "coord" in col:
            pass
        else:
            data_dict[col] = df[col].values
    return data_dict

@multimethod
def update(self:object, path:str, key:str, value:float):
    self.__store__[path + key] = value
    
@update.register
def _(self:object, path:str, key:str, value:np.ndarray):
    self.__store__[path + key] = value
    

@update.register
def _(self:object, path:str, key:str, value:list):
    self.__store__[path + key] = np.asarray(value) 
    
    
    
@update.register
def _(self:object, path:str, key:str, value:str):
    self.__store__.attrs[path+key] = value
    
    
@update.register
def _(self:object, path:str, key:str, value:List[str]):
    self.__store__[path + key] = np.array(value).astype('|S100')
    
    

@update.register
def _(self:object, path:str, key:str, value:dict):
    for key, item in value.items():
        key = str(key)
        if isinstance(item, pandas.core.frame.DataFrame):
            data_dic = self.get_dic(item)
            self.update(path+"/"+key,"/",data_dic )
        else:
       
            self.update(path+"/"+key,"/",item )
            

def __openfile__(self, fname):
    self.__store__ =  h5py.File(fname,'r+')

def closefile(self):
    self.__store__.close()


# The next five methods are requirements of the ABC.
def __setitem__(self,path, key, value):
    self.update(path, key, value)
    
def __getitem__(self, key):
    if isinstance(key, int):
        key = str(key)
    return self.__store__[key]


def __delitem__(self, key):
    del self.__dict__[key]
def __iter__(self):
    return iter(self.__dict__)
def __len__(self):
    return len(self.__dict__)
# The final two methods aren't required, but nice for demo purposes:
def __str__(self):
    '''returns simple dict representation of the mapping'''
    return str(self.__dict__)

When i try to access an element it works for instance I can do FancyDicObj[key] etc but when I do this it returns an HDF5 group object as expected which I can only access through string keys. I would like to be able to access the elements by integer keys as well. Is there a way to introduce a function so that subscriptable access by integer key is still possible in this key value store. For instance FancyDicObj[key][1] instead of FancyDicObj[key]["1"] even though I store the data in string representation of the numeric keys.

arrhhh
  • 99
  • 2
  • 16
  • If you want to use an integer key with a dictionary, you need to create with an integer. Why do you prefer an integer instead of a string? What are you trying to do? Why not use `FancyDicObj[key][str(1)]` to get the string for the integer value? – kcw78 Sep 16 '21 at 16:17
  • Because its a little ugly and this is not compatible with python dictionary syntax where i can just use an integer key without casting to string my question is can i wrap the fancy dicobj in another class and just use integer keys the way i would do with python dictionaries – arrhhh Sep 17 '21 at 07:06
  • Because rifht now at the first level of access i can access by integer keys as the class casts the integer keys to string without me telling it to at the lower nested levels of the hdf5 file its not done as accessing the fancydicobj returns the group or dataset which doesnt have an integer key access – arrhhh Sep 17 '21 at 07:09

1 Answers1

1

It's ugly because you are trying to push a square peg into a round hole. HDF5 is a container of structured data (where the user defines the structure; aka the schema). Groups are used to organize data, and Datasets can hold typical Python scalars (ints, floats, strings) and NumPy arrays of similar objects. However, HDF5 does NOT have a dictionary object. So, if you want to store a Python dictionary, you have to map the data into HDF5 objects: groups, datasets, and attributes.

Have you considered the pickle module to do this? Here are 2 SO topics on this:

If you decide to continue with HDF5:
In your class FancyDic(), you mapped the dictionary key to a group name. Names are strings. There is no way to work around this limitation.

Attributes in HDF5 provide an alternate mechanism. They are designed to save small amounts of data and use name=value pairs (similar to a dictionary). However, for performance reasons, they should be small (<64kb). Also, the name (key) must be a string, so has the same limitation as group and dataset names.

kcw78
  • 7,131
  • 3
  • 12
  • 44
  • of course thats true and i know how to save a dictionary into a pickle file i just wanted to know if there was a way of doing what i want in hdf5. i know it does not have a dictionary object but i am trying to write a key value store using hdf5 as a backend and which has access for integer keys. for the attributes i know about it and i use it for storing small data my question is more about programming and if there is a way to wrap the hdf5 file to allow integer keys – arrhhh Sep 18 '21 at 18:21