3

First post here and I'm a beginner - hope I'm making myself useful...

I'm trying to find and understand the ADT/concept that does the job I'm after. I'm guessing it's already out there.

I have an array/list/tree (container to be decided) of objects each of which has a count associated with how much it hasn't been used over iterations of a process. As iterations proceed the count for each object accumulates by 1. The idea is that sooner or later I'm going to need the memory that any unused objects are using so I'll delete them to make space for an object not in RAM (which will have an initial count of '0') - But, if it turns out that I use an object that is still in memory it's count is reset to '0', and I pat myself on the back for not having had to access the disk for its contents.

A cache?

The main process loop would have something similar to the following in it:

if (object needs to be added && (totalNumberOfObjects > someConstant))
    object with highest count deleted from RAM and the (heap??)
    newObject added with a count of '0'

if (an object already in RAM is accessed by the process)
    accessedObject count is set to '0'

for (All objects in RAM) 
    count++

I could bash about for a (long and buggy time) and build my own mess, but I thought it'd be interesting to learn the most efficient way from word go.

Something like a heap?

Lamar Latrell
  • 1,669
  • 13
  • 28
  • So your data structure should be optimized for search? – Chubsdad Dec 07 '12 at 04:48
  • Possibly, I'm trying to figure a way to make the data structure both represent the data in its essence (quad-trees and/or octrees) and at the same time represent the 'priority' of deletion from RAM. I'm starting to think that maybe I use whatever the cache aspect utilises as the base data type. If it needs to be sorted then so be it. Are you referring to hash tables? (another thing I'm new to) – Lamar Latrell Dec 07 '12 at 13:33

1 Answers1

2

You could use a heap for this, but I think it would be overkill. It sounds like you're not going to have a lot of different values for the counts, and you'll have a lot of objects with each count. If that's true, then you only need thread the objects onto a list of objects with the same count. These lists are themselves arranged in a dequeue (or 'deque' as C++ insists on calling it).

The key here is that you need to increment the count of all objects, and presumably you want that to be O(1) if possible, rather than O(N). And it is possible: the key is that each list's header contains also the difference of its count from the next smaller count. The header of the list with the smallest count contains a delta from 0, which is the smallest count. To increment the count of all objects, you only have to increase this single number by one.

To set an object's count to 0, you remove the object from its list (which means you always need to refer to objects by their list iterator, or you need to implement your own intrusive linked list), and either (1) add it to the bottom list, if that list has a count of 0, or (2) create a new bottom list with a count of 0 containing only that object.

The procedure for creating a new object is the same, except that you don't have to unlink it from its current list.

To evict an object from memory, you choose the object at the head of the top list (which is the list with the largest count). If that list becomes empty, you pop it off the dequeue. If you need more memory, you can repeat this operation.

So all operations, including "increment all counts", are O(1). Unfortunately, the storage overhead is two pointers per object, plus two pointers and an integer per unique count (at worst, this is the same as the number of objects, but presumably in practice it's much less). Since it's hard to imagine any other algorithm which uses less than one pointer plus a count for each object, this is probably not even a space-time tradeoff; the additional space requirements are minimal.

rici
  • 234,347
  • 28
  • 237
  • 341
  • I'd upvote you but I cant yet ! What do you think of the method outlined here: http://stackoverflow.com/questions/2504178/lru-cache-design (took me some time to find it) – Lamar Latrell Dec 09 '12 at 04:24
  • That method is the stripped-down version of my algorithm; the main difference is that it doesn't let you know how old the objects are (but maybe you don't care about that), and it's more precise. I've used my algorithm in a context where I was prepared to grow (or shrink) the cache based on demand, and where I wanted to automatically drop objects if they were too old, even if there was room to keep them. I prefer intrusive lists to hash tables, but tastes (and requirements) vary. – rici Dec 09 '12 at 04:40
  • I dont need to know how old they are in absolute terms, only relative. It would be nice to be able to grow and shrink the cache eventually, but I'm not yet read up on how to query an OS for memory information, or indeed let it 'tell' my cache it's size. Bigger issues to tackle first. 'Intrusive-lists'>> ok, googling! – Lamar Latrell Dec 09 '12 at 04:51
  • Phew, lots to learn ... if you see this post here - you'll see how much of a beginner I am: http://stackoverflow.com/questions/13784732/first-time-with-linked-lists-how-do-i-name-objects-for-insertion – Lamar Latrell Dec 09 '12 at 04:53
  • Good luck! I'm not much of a java programmer, so I doubt I can be of much help. http://stackoverflow.com/questions/3361145/intrusive-lists has some info on intrusive lists, and some links with more info. My caches were connection caches, not memory caches, and I increased them if a lot of new connections were created in a short period of time, and then let them decay; that avoids connection thrashing. It's a slightly different problem domain, though. – rici Dec 09 '12 at 05:01