3

This post would likely be a good candidate for frequently asked questions at OpenHFT.

I am playing with ChronicleMap considering it for an idea but having lots of questions. I am sure most junior programmers who are looking into this product have similar considerations.

Would you explain how memory is managed in this API?

ChronicleMap proclaims some remarkable TBs off-heap memory resources available to processing its data and I would like to get a clear vision on that.

Lets get down to a programmer with a laptop of 500GB HD and 4GB RAM. In this case pure math sais - total resource of 'swapped' memory available is 504GB. Let's give the OS and other programs half and we are left with 250GB HD and 2GB RAM. Can you elaborate on the actual available memory ChronicleMap can allocate in numbers relative to available resources?

Next related questions are relative to the implementation of ChronicleMap.

My understanding is that each ChronicleMap allocates chunk of memory it works with and optimal performance/memory usage is achieved when we can accurately predict the amount of data passed through. However, this is a dynamic world.

Lets set an (exaggerated but possible) example:

Suppose a map of K (key) 'cities' and their V (value) - 'description' (of the cities) and allowing users large limits on the description length.

First user enters: K = "Amsterdam", V = "City of bicycles" and this entry is used to declare the map - it sets the precedent for the pair like this:

ChronicleMap<Integer, PostalCodeRange> cityPostalCodes = ChronicleMap
    .of(CharSequence.class, CharSequence.class)
    .averageKey("Amsterdam")
    .averageValue("City of bicycles")
    .entries(5_000)
    .createOrRecoverPersistedTo(citiesAndDescriptions);

Now, next user gets carried away and writes an assay about Prague He passes to: K = "Prague", V = "City of 100 towers is located in the hard of Europe ... blah, blah... million words ..."

Now the programmer had expected max 5_000 entries, but it gets out of his hands and there are many thousands of entries.

Does ChronicleMap allocate memory automatically for such cases? If yes is there some better approach of declaring ChronicleMaps for this dynamic solution? If no, would you recommend an approach (best in code example) how to best handle such scenarios?

How does this work with persistence to file?

Can ChronicleMaps deplete my RAM and/or disk space? Best practice to avoid that?

In other words, please explain how memory is managed in case of under-estimation and over-estimation of the value (and/or key) lengths and number of entries.

Which of these are applicable in ChronicleMap?

  1. If I allocate big chunk (.entries(1_000_000), .averageValueSize(1_000_000) and actual usage is - Entries = 100, and Average Value Size = 100.

What happens?:

1.1. - all works fine, but there will be large wasted chunk - unused?

1.2. - all works fine, the unused memory is available to:

1.2.1 - ChronicleMap

1.2.2 - given thread using ChronicleMap

1.2.3 - given process

1.2.4 - given JVM

1.2.5 - the OS

1.3. - please explain if something else happens with the unused memory

1.4. - what does the over sized declaration do to my persistence file?

  1. Opposite of case 1 - I allocate small chunk (.entries(10), .averageValueSize(10) and the actual usage is 1_000_000s of entries, and Average Value Size = 1_000s of bytes. What happens?:
leventov
  • 14,760
  • 11
  • 69
  • 98
Felix
  • 213
  • 2
  • 9
  • Hi there. Please bear in mind that our community is made up of various genders, and that some people may feel excluded if you refer to them as 'gentlemen'. We prefer posts not to contain any salutations at all anyway. Thanks! – halfer Sep 15 '16 at 21:52

1 Answers1

5

Lets get down to a programmer with a laptop of 500GB HD and 4GB RAM. In this case pure math sais - total resource of 'swapped' memory available is 504GB. Let's give the OS and other programs half and we are left with 250GB HD and 2GB RAM. Can you elaborate on the actual available memory ChronicleMap can allocate in numbers relative to available resources?

Under such conditions Chronicle Map will be very slow, with on average 2 random disk reads and writes (4 random disk operations in total) on each operation with Chronicle Map. Traditional disk-based db engines, like RocksDB or LevelDB, should work better when the database size is much bigger than memory.


Now the programmer had expected max 5_000 entries, but it gets out of his hands and there are many thousands of entries.

Does ChronicleMap allocate memory automatically for such cases? If yes is there some better approach of declaring ChronicleMaps for this dynamic solution? If no, would you recommend an approach (best in code example) how to best handle such scenarios?

Chronicle Map will allocate memory until the actual number of entries inserted divided by the number configured through ChronicleMapBuilder.entries() is not higher than the configured ChronicleMapBuilder.maxBloatFactor(). E. g. if you create a map as

ChronicleMap<Integer, PostalCodeRange> cityPostalCodes = ChronicleMap
    .of(CharSequence.class, CharSequence.class)
    .averageKey("Amsterdam")
    .averageValue("City of bicycles")
    .entries(5_000)
    .maxBloatFactor(5.0)
    .createOrRecoverPersistedTo(citiesAndDescriptions);

It will start throwing IllegalStateException on attempts to insert new entries, when the size will be ~ 25 000.

However, Chronicle Map works progressively slower when the actual size grows far beyond the configured size, so the maximum possible maxBloatFactor() is artificially limited to 1000.

The solution right now is to configure the future size of the Chronicle Map via entries() (and averageKey(), and averageValue()) at least approximately correctly.

The requirement to configure plausible Chronicle Map's size in advance is acknowledged to be a usability problem. There is a way to fix this and it's on the project roadmap.


In other words, please explain how memory is managed in case of under-estimation and over-estimation of the value (and/or key) lengths and number of entries.

Key/value size underestimation: space is wasted in hash lookup area, ~ 8 bytes * underestimation factor, per entry. So it could be pretty bad if the actual average entry size (key + value) is small, e. g. 50 bytes, and you have configured it as 20 bytes, you will waste ~ 8 * 50 / 20 = 20 bytes, or 40%. Bigger the average entry size, smaller the waste.

Key/value size overestimation: if you configure just key and value average size, but not actualChunkSize() directly, the actual chunk size is automatically chosen between 1/8th and 1/4th of the average entry size (key + value). The actual chunk size is the allocation unit in Chronicle Map. So if you configured average entry size as ~ 1000 bytes, the actual chunk size will be chosen between 125 and 250 bytes. If the actual average entry size is just 100 bytes, you will lose a lot of space. If the overestimation is small, the expected space losses are limited to about 20% of the data size.

So if you are afraid you may overestimate the average key/value size, configure actualChunkSize() explicitly.

Number of entries underestimation: discussed above. No particular space waste, but Chronicle Map works slower, the worse the underestimation.

Number of entries overestimation: memory is wasted in hash lookups area, ~ 8 bytes * overestimation factor, per entry. See the section key/value size underestimation above on how good or bad it could be, depending on the actual average entry data size.

Community
  • 1
  • 1
leventov
  • 14,760
  • 11
  • 69
  • 98