3

I have a python dictionary which is added with info from time to time.

Conditions:

  1. There should be 3 or less keys in the dictionary at all time. If there is a need to add another key-value pair, the earliest added key in the dictionary at that moment should be deleted and this new one should be added.
  2. Once a key-value is added, after 1 day, it should be deleted.

Example:

result = {}

def add_key(key, val):
    test = result.get(key) # Test to see if key already exists
    if test is None:
        if len(result.keys()) < 3:
            result[key] = val # This key and pair need to be deleted from the dictionary after
                            # 24 hours
        else:
            # Need code to remove the earliest added key.

add_key('10', 'object 2')

## Need another code logic to delete the key-value pair after 24 hours

How should I go about this issue?

nifeco
  • 211
  • 1
  • 8
  • 2
    Could look at using a `collections.OrderedDict`, it has a `popitem(last=False)` method that can be used to remove the earliest added key https://docs.python.org/3/library/collections.html#collections.OrderedDict.popitem – Iain Shelvington Jan 05 '22 at 13:36
  • 3
    You're going to have to maintain some metadata about each value in the dictionary in order to determine their age. I suggest you encapsulate everything in a dictionary subclass that does this. Off-topic: Don't name variables the same as built-ins, like `dict`. – martineau Jan 05 '22 at 13:38
  • @IainShelvington. Thank you! I will look into it! – nifeco Jan 05 '22 at 13:45
  • @martineau Thankyou. I will look into this. Yes, I shouldn't have done that, I will be more careful in the future. – nifeco Jan 05 '22 at 13:47

5 Answers5

5

Here's how to do what I recommended in my comment under you question — namely to do it by implementing a dictionary subclass that does what's needed via metadata it maintains internally about each of the values it contains.

This can be done fairly easily via the abstract base classes for containers in the collections.abc module because it minimizes the number of methods that need to be implemented in the subclass. In this case abc.MutableMapping was used at the base class.

Note that I've add a new method named prune() that show how all entries older than a specified amount could be removed. In addition to that, I've left some extraneous calls to print() in the code to make what it's doing clearer when testing it — which you'll probably want to remove.

from collections import abc
from datetime import datetime, timedelta
from operator import itemgetter
from pprint import pprint
from time import sleep


class LimitedDict(abc.MutableMapping):
    LIMIT = 3

    def __init__(self, *args, **kwargs):
        self._data = dict(*args, **kwargs)

        if len(self) > self.LIMIT:
            raise RuntimeError(f'{type(self).__name__} initialized with more '
                               f'than the limit of {self.LIMIT} items.')

        # Initialize value timestamps.
        now = datetime.now()
        self._timestamps = {key: now for key in self._data.keys()}

    def __getitem__(self, key):
        return self._data[key]

    def __setitem__(self, key, value):
        if key not in self:  # Adding a key?
            if len(self) >= self.LIMIT:   # Will doing so exceed limit?
                # Find key of oldest item and delete it (and its timestamp).
                oldest = min(self._timestamps.items(), key=itemgetter(1))[0]
                print(f'deleting oldest item {oldest=} to make room for {key=}')
                del self[oldest]

        # Add (or update) item and timestamp.
        self._data[key], self._timestamps[key] = value, datetime.now()

    def __delitem__(self, key):
        """Remove item and associated timestamp."""
        del self._data[key]
        del self._timestamps[key]

    def __iter__(self):
        return iter(self._data)

    def __len__(self):
        return len(self._data)

    def prune(self, **kwargs) -> None:
        """ Remove all items older than the specified maximum age.

        Accepts same keyword arguments as datetime.timedelta - currently:
        days, seconds, microseconds, milliseconds, minutes, hours, and weeks.
        """
        max_age = timedelta(**kwargs)
        now = datetime.now()
        self._data = {key: value for key, value in self._data.items()
                        if (now - self._timestamps[key]) <= max_age}
        self._timestamps = {key: self._timestamps[key] for key in self._data.keys()}



if __name__ == '__main__':

    ld = LimitedDict(a=1, b=2)
    pprint(ld._data)
    pprint(ld._timestamps)
    sleep(1)
    ld['c'] = 3  # Add 3rd item.
    print()
    pprint(ld._data)
    pprint(ld._timestamps)
    sleep(1)
    ld['d'] = 4  # Add an item that should cause limit to be exceeded.
    print()
    pprint(ld._data)
    pprint(ld._timestamps)
    ld.prune(seconds=2) # Remove all items more than 2 seconds old.
    print()
    pprint(ld._data)
    pprint(ld._timestamps)
martineau
  • 119,623
  • 25
  • 170
  • 301
  • This is a beautiful answer! + 1! Thanks! instead of using datetime, can we simply use an ever increasing counter? – Aditya Jan 05 '22 at 17:19
  • 1
    If all you want to do is remove the oldest — which is the only thing currently being done by the code — then a simple counter would suffice. I used `datetime`s to allow being able to determine how old an entry was which would provide a means of cleaning out all entries older than certain amount of time, such as 1 day, if desired (via some custom method specific to the subclass). – martineau Jan 05 '22 at 17:53
  • Thanks for getting back! Seems like LRU :) – Aditya Jan 05 '22 at 17:54
  • 1
    Indeed, if all you want to do it keep the most recent N values, it might be possible to use the [`functools.lru_cache`](https://docs.python.org/3/library/functools.html#functools.lru_cache) decorator somehow to do what you need. – martineau Jan 05 '22 at 18:04
  • 1
    Note I added a `prune()` method demonstratomg how the datetime timestamps could be used to delete items older than a specified amount. – martineau Jan 06 '22 at 11:51
  • It's pretty easy to convert this to a limited capacity LRU cache as well; you have to update `__getitem__` to update the timestamps for the accessed key but this will be quite slower i feel (due to that min over all `timestamps.items()`repeatedly), better to use `OrderDict/Dict` as is or hash_map+DLL (linked list style); – Aditya Jan 06 '22 at 12:30
  • I'm merely answering the question you posted along with your follow-on one about using `datetime`s. – martineau Jan 06 '22 at 12:35
  • Thanks for the answer, it's very nice! No more follow-up's! – Aditya Jan 06 '22 at 12:41
  • This is a great answer. One caveat: it might not work as expected on Python for Windows. In my experiments, `datetime.now()` sometimes has a resolution of only about 0.015s (i.e., your code can run for 15000 microseconds and have `now()` return the same answer every time!) This means that the LRU effect is not guaranteed with elements added in close succession. For a solution with the 24-hour thing and guaranteed LRU effect, I would combine both `datetime.now()` and a sequence counter. – joanis Jan 06 '22 at 14:43
  • @joanis: Thanks. Good points. I used `datetime` to make it relatively easy to specify a duration like 1 day, as the OP mentioned in their question. For more accurate small time intervals something with higher resolution like the [`time.perf_counter_ns`](https://docs.python.org/3/library/time.html#time.perf_counter_ns) would better suited. – martineau Jan 06 '22 at 15:08
  • @martineau I would actually not try to solve the precision of the timer for this particular problem. Yes, `time.perf_counter_ns` will be a lot more precise, but a sequence counter will always yield an exact LRU behaviour, whereas a timer-based solution will always run the risk of collision. But for timing code, yes, `time.perf_counter_ns` is absolutely the right solution, as I recently discovered. – joanis Jan 06 '22 at 15:30
2

Here's one idea:

  • Subclass dict with your own class type (see this question).
  • In each function that deals with key, transform the key into a tuple of (datetime.now(), key).
  • At the beginning of each function, first do the 24-hour check and remove any expired items.
  • At the end of __setitem__ truncate the container down to 3 elements if necessary.
0x5453
  • 12,753
  • 1
  • 32
  • 61
1

I know this is only a partial answer to your question, but here is an approach for the first part of the question:

Define a variable holding the last added key on the same scope as the dictionary. Than you can remove said key in your else statement. Don't forget to update the variable each time you add a key.

Dharman
  • 30,962
  • 25
  • 85
  • 135
msts1906
  • 89
  • 1
  • 11
1

Inspired by the SO question Python Datatype for a fixed-length FIFO


One way of doing this is by using collections.

I created this short demo using collections.deque

We start by initializing the variable x to a list containing 5 empty dictionaries, and a max length of 5. Then creating the method add_v which uses the method appendleft to have it behave like a FIFO stack with a size limit of 5.

import collections

x = collections.deque([{}]*5, 5)


def add_key(key, value):
    x.appendleft({key: value})

Note: x will be a list containing dictionaries and not a dictionary containing entries.

Asger Weirsøe
  • 350
  • 1
  • 11
1

This code is my best effort at solving the first problem. It adds an indicator to the end of the they key and always deletes the earliest key when a new key is added:

global myDict
myDict = {}
globals()["currVal"] = 3
def add_key(key, val):
    if key not in [k[0:-1] for k in myDict.keys()]: # Test to see if key already exists
        if len(myDict.keys()) < 3:
            myDict[str(key) + str(len(myDict.keys())+1)] = val # This key and pair need to be deleted from the dictionary after        
        else:
            deleteKeys = []
            for k in myDict:
                if k.endswith(str(globals()["currVal"]-2)):
                    deleteKeys.append(k) 
            myDict[str(key) + str(globals()["currVal"]+1)] = val
            for ks in deleteKeys:
                del myDict[ks]
            globals()["currVal"] += 1
    return myDict

add_key('100', 'object 1')
print(myDict)
add_key('101', 'object 2')
print(myDict)
add_key('102', 'object 3')
print(myDict)
add_key('103', 'object 4')
print(myDict)
add_key('104', 'object 5')
print(myDict)
add_key('105', 'object 6')
print(myDict)

Output:

{'1001': 'object 1'}
{'1001': 'object 1', '1012': 'object 2'}
{'1001': 'object 1', '1012': 'object 2', '1023': 'object 3'}
{'1012': 'object 2', '1023': 'object 3', '1034': 'object 4'}
{'1023': 'object 3', '1034': 'object 4', '1045': 'object 5'}
{'1034': 'object 4', '1045': 'object 5', '1056': 'object 6'}
Eli Harold
  • 2,280
  • 1
  • 3
  • 22