2

I would like to use a ChronicleMap as a memory-mapped key-value database (String to byte[]). It should be able to hold up to the order of 100 million entries. Reads/gets will happen much more frequently than writes/puts, with an expected write rate of less than 10 entries/sec. While the keys would be similar in length, the length of the value could vary strongly: it could be anything from a few bytes up to tens of Mbs. Yet, the majority of values will have a length between 500 to 1000 bytes.

Having read a bit about ChronicleMap, I am amazed about its features and am wondering why I can't find articles describing it being used as a general key-value database. To me there seem to be a lot of advantages of using ChronicleMap for such a purpose. What am I missing here?

What are the drawbacks of using ChronicleMap for the given boundary conditions?

xpages-noob
  • 1,569
  • 1
  • 10
  • 37

1 Answers1

3

I voted for closing this question because any "drawbacks" would be relative.

As a data structure, Chronicle Map is not sorted, so it doesn't fit when you need to iterate the key-value pairs in the sorted order by key.

Limitation of the current implementation is that you need to specify the number of elements that are going to be stored in the map in advance, and if the actual number isn't close to the specified number, you are going to overuse memory and disk (not very severely though, on Linux systems), but if the actual number of entries exceeds the specified number by approximately 20% or more, operation performance starts to degrade, and the performance hit grows linearly with the number of entries growing further. See https://github.com/OpenHFT/Chronicle-Map/issues/105

leventov
  • 14,760
  • 11
  • 69
  • 98
  • Thanks for your answer. (1) The map is just needed for getting a value by key. I won't iterate over it, and I don't need it to be sorted. (2) I thought that on Linux with ext4 file system you can theoretically provide any size up to the available disk space for a memory-mapped file. That file would be small initially and grow as it gets filled with data. I had planned to initialize the map with a large number of entries (500m) and set the averageValueSize such that the calculated maximum file size would not exceed the disk space. Please correct me if that's not how its done. – xpages-noob Jan 02 '18 at 11:36
  • 1
    The waste is from internal fragmentation (in the [hash lookup area](https://github.com/OpenHFT/Chronicle-Map/blob/master/spec/3-memory-layout.md#hash-lookup)), not external fragmentation, so it's unavoidable even on Linux and with any filesystem used. – leventov Jan 02 '18 at 12:26
  • 1
    With your variance of value size, I recommend to specify [`actualChunkSize`](http://static.javadoc.io/net.openhft/chronicle-map/3.14.3/net/openhft/chronicle/hash/ChronicleHashBuilder.html#actualChunkSize-int-) directly, along with other "low-level" configs, i. e. `actualChunksPerSegmentTier`, `actualSegment` and `entriesPerSegment`. – leventov Jan 02 '18 at 12:30
  • Thanks for that helpful info and advice. If I know that 95% of all entries will have a length <1500 bytes, and the other 5% would likely be a lot longer (byte arrays of file data), would I be better off splitting my data in 2 maps with different chunk sizes, for example 256b for the first case and 4096b for the latter? Or is the performance drop due to too many chunks (javadoc: "Particularly avoid entries to take more than 64 chunks.") negligible if those entries are not read frequently? – xpages-noob Jan 02 '18 at 13:34
  • 1
    @xpages-noob It depends on what you mean by "a lot longer" If you mean 10x longer, this should be a problem, if you mean 1000x larger use another map. – Peter Lawrey Jan 02 '18 at 16:18
  • 2
    @xpages-noob number 64 comes from the fact that we use bitset for space allocation and it switches to a little slower algorithm because > 64 bits don't fit a primitive long value, but not terribly so. If you need entries of 100 or 1000 chunks there is not a big deal actually with Chronicle Map, properly configured number of chunks per tier / number of entries should handle it. I would recommend to specify less segments (or just 1), if you don't need any high update concurrency. – leventov Jan 02 '18 at 16:23
  • Thank you for the explanations and recommendations. Due to my limited experience with ChronicleMap (<24h) I currently only have a vague idea of how all the mentioned low-level configuration parameters influence the performance in practice. Yet, I'm happy that there are so many tweeking options and I'm confident that I'll find a proper configuration for my "storage needs". Thanks again for your help. – xpages-noob Jan 02 '18 at 17:04
  • PS: I usually wait at least 1 day before accepting an answer. Just in case someone else has something to say :) – xpages-noob Jan 02 '18 at 17:06
  • @leventov One more question: Is the feature that you linked in your answer ("Remove entries() limitation") going to be implemented in the near future? – xpages-noob Jan 03 '18 at 08:25
  • That's unlikely – leventov Jan 03 '18 at 09:26