5

I am new to chronicle-map. I am trying to model an off heap map using chronicle-map where the key is a primitive short and the value is a primitive long array. The max size of the long array value is known for a given map. However I will have multiple maps of this kind each of which may have a different max size for the long array value. My question relates to the serialisation/deserialisation of the key and value.

From reading the documentation I understand that for the key I can use the value type ShortValue and reuse the instance of the implementation of that interface. Regarding the value I have found the page talking about DataAccess and SizedReader which gives an example for byte[] but I'm unsure how to adapt this to a long[]. One additional requirement I have is that I need to get and set values at arbitrary indices in the long array without paying the cost of a full serialisation/deserialisation of the entire value each time.

So my question is: how can I model the value type when constructing the map and what serialisation/deserialisation code do I need for a long[] array if the max size is known per map and I need to be able to read and write random indices without serialising/deserialising the entire value payload each time? Ideally the long[] would be encoded/decoded directly to/from off heap without undergoing an on heap intermediate conversion to a byte[] and also the chronicle-map code would not allocate at runtime. Thank you.

junkie
  • 809
  • 2
  • 8
  • 19

2 Answers2

3

First, I recommend to use some kind of LongList interface abstraction instead of long[], it will make it easier to deal with size variability, provide alternative flyweight implementations, etc.

If you want to read/write just single elements in large lists, you should use advanced contexts API:

/** This method is entirely garbage-free, deserialization-free, and thread-safe. */
void putOneValue(ChronicleMap<ShortValue, LongList> map, ShortValue key, int index,
        long element) {
    if (index < 0) throw throw new IndexOutOfBoundsException(...);
    try (ExternalMapQueryContext<ShortValue, LongList, ?> c = map.getContext(key)) {
        c.writeLock().lock(); // (1)
        MapEntry<ShortValue, LongList> entry = c.entry();
        if (entry != null) {
            Data<LongList> value = entry.value();
            BytesStore valueBytes = (BytesStore) value.bytes(); // (2)
            long valueBytesOffset = value.offset();
            long valueBytesSize = value.size();
            int valueListSize = (int) (valueBytesSize / Long.BYTES); // (3)
            if (index >= valueListSize) throw new IndexOutOfBoundsException(...);
            valueBytes.writeLong(valueBytesOffset + ((long) index) * Long.BYTES,
                element);
            ((ChecksumEntry) entry).updateChecksum(); // (4)
        } else {
            // there is no entry for the given key
            throw ...
        }
    }
}

Notes:

  1. You must acquire writeLock() from the beginning, because otherwise readLock() is going to be acquired automatically when you call context.entry() method, and you won't be able to upgrade read lock to write lock later. Please read HashQueryContext javadoc carefully.
  2. Data.bytes() formally returns RandomDataInput, but you could be sure (it's specified in Data.bytes() javadoc) that it's actually an instance of BytesStore (that's combination of RandomDataInput and RandomDataOutput).
  3. Assuming proper SizedReader and SizedWriter (or DataAccess) are provided. Note that "bytes/element joint size" technique is used, the same as in the example given in SizedReader and SizedWriter doc section, PointListSizeMarshaller. You could base your LongListMarshaller on that example class.
  4. This cast is specified, see ChecksumEntry javadoc and the section about checksums in the doc. If you have a purely in-memory (not persisted) Chronicle Map, or turned checksums off, this call could be omitted.

Implementation of single element read is similar.

leventov
  • 14,760
  • 11
  • 69
  • 98
  • Many thanks @leventov for your excellent and detailed answer. I got everything working as explained. – junkie Feb 07 '18 at 21:24
  • Some questions: 1) I've implemented a SizedReader+Writer. Do I need DataAccess or is SizedWriter fast enough for primitive arrays? I looked at the ByteArrayDataAccess but it's not clear how to port it for long arrays given that the internal HeapBytesStore is so specific to byte[]/ByteBuffers? 2) Does the read/write locking mediate across multiple process reading and writing on same machine or just within a single process? 3) When storing objects, with a variable size not known in advance, as values will that cause fragmentation off heap and in the persisted file? – junkie Feb 07 '18 at 21:28
1

Answering extra questions:

I've implemented a SizedReader+Writer. Do I need DataAccess or is SizedWriter fast enough for primitive arrays? I looked at the ByteArrayDataAccess but it's not clear how to port it for long arrays given that the internal HeapBytesStore is so specific to byte[]/ByteBuffers?

Usage of DataAccess instead of SizedWriter allows to make one less value data copy on Map.put(key, value). However, if in your use case putOneValue() (as in the example above) is the dominating type of query, it won't make much difference. If Map.put(key, value) (and replace(), etc., i. e. any "full value write" operations) are important, it is still possible to implement DataAccess for LongList. It will look like this:

class LongListDataAccess implements DataAccess<LongList>, Data<LongList>,
        StatefulCopyable<LongListDataAccess> {
    transient ByteStore cachedBytes;
    transient boolean cachedBytesInitialized;
    transient LongList list;

    @Override public Data<LongList> getData(LongList list) {
        this.list = list;
        this.cachedBytesInitialized = false;
        return this;
    }

    @Override public long size() {
        return ((long) list.size()) * Long.BYTES;
    }

    @Override public void writeTo(RandomDataOutput target, long targetOffset) {
        for (int i = 0; i < list.size(); i++) {
            target.writeLong(targetOffset + ((long) i) * Long.BYTES), list.get(i));
        }
    }

    ...
}

For efficiency, the methods size() and writeTo() are key. But it's important to implement all other methods (which I didn't write here) correctly too. Read DataAccess, Data and StatefulCopyable javadocs very carefully, and also Understanding StatefulCopyable, DataAccess and SizedReader and Custom serialization checklist in the tutorial with great attention too.


Does the read/write locking mediate across multiple process reading and writing on same machine or just within a single process?

It's safe accross processes, note that the interface is called InterProcessReadWriteUpdateLock.


When storing objects, with a variable size not known in advance, as values will that cause fragmentation off heap and in the persisted file?

Storing value for a key once and not changing the size of the value (and not removing keys) after that won't cause external fragmentation. Changing size of the value or removing keys could cause external fragmentation. ChronicleMapBuilder.actualChunkSize() configuration allows to trade between external and internal fragmentation. The bigger the chunk, the less external fragmentation, but the more internal fragmentation. If your values are significantly bigger than page size (4 KB), you could set up absurdly big chunk size and still have internal fragmentation bound by the page size, because Chronicle Map is able to exploit lazy page allocation feature in Linux.

leventov
  • 14,760
  • 11
  • 69
  • 98
  • Sorry for the delay. Thank you very much once again @leventov for answering all questions is so much detail. I will look further into the docs in the places you've suggested to get a better understanding. – junkie Feb 10 '18 at 15:40