1

The data set consists of increasing values. Data is always appended, and thus is inherently sorted. There is no need for random insertions in the middle or beginning of the set, however, random deletions will occur.

Operations consist of:

  1. Finding the the nth value,
  2. Finding the position N for a given value
  3. Appending a sorted value at the end
  4. Random removal of a given value or index

Value iteration/locality is not at all important

for similar questions people proposed using a map of values-index as a solution, however in this case the indexes themselves must be sorted, e.g. arr[N] < arr[N+x]. The array+map solution does not work here.

The sequence itself needs to be stored on disk and I have a key-value database (lmdb) at my disposal. Having something which is serialized/deserialized in its entirety is also something to be avoided, as it is usually opened and closed fairly quickly, and deserializing the set in its entirety would potentially evict precious pages from elsewhere.

The set will contain ~100k items (and several thousand of these sets may be open simultaneously).

The closest in-memory structure is probably a skiplist, but I've yet to find an implementation that supports finding the nth element, and also unsure how it would be loaded/stored on disk

Mark Nunberg
  • 3,551
  • 15
  • 18
  • Does this answer your question? https://stackoverflow.com/questions/10699265/how-can-i-efficiently-select-a-standard-library-container-in-c11/10701102 – 463035818_is_not_an_ai Jan 21 '22 at 13:24
  • No.. leaving aside the serialization issues, there is no stl container other than list which has good performance profile for random removal, and of course a list doesn't do well with indexing – Mark Nunberg Jan 21 '22 at 13:28
  • [std::deque](https://en.cppreference.com/w/cpp/container/deque) maybe? – Alan Birtles Jan 21 '22 at 13:43
  • if you are looking for a non-std container then you are asking for a library which is offtopic – 463035818_is_not_an_ai Jan 21 '22 at 13:47
  • @AlanBirtles Unfortunately deque is O(n) for random removal – Mark Nunberg Jan 21 '22 at 14:45
  • What is the data type of a value and what is its range? Then we could think of a combination of STL containers. Or build a custom one. (And why not use a database like Oracle for this low number of items?). And most important: What do you want to achieve? – A M Jan 21 '22 at 15:42
  • That largely depends on your specific requirements - how large is each a "value"? Is each value entry the same size or do they have different sizes? How often will items be inserted at the end / deleted at a random location - will insertion stop at a certain point in time? Is the 100k items a hard limit? What memory constraints do you have, and what disk usage constraints? Are you looking for a purely in-memory data structure or rather a format to save the data on disk for efficient access? There are a lot of factors that need to be considered in this case. – Turtlefight Jan 24 '22 at 04:34

0 Answers0