1

I'm trying to configure the rocksdb I'm using as a backend for my flink job. The state rocksdb needs to hold is not too big (around 5G) but it needs to deal with a lot of missing keys. I mean that 80% of the get requests will not find the key in the data base. I wonder whether there is a specific configuration to help with the memory consumption. I have tried to use bloom filters with 3 bits key and increase the block size to 16kb but it doesn't seem to help and the job fails on out of memory exceptions. I'll be glad to hear more suggestions

JoeHills
  • 43
  • 4
  • What are your memory settings in Flink? 5GB sounds rather small tbh. Do you even need rocksDB or would an in-memory heap work as well on your machine? – Arvid Heise Aug 25 '22 at 12:45
  • 1
    The state itself is small though it doesn't include the windows' state and timers. In addition there can be very high picks in the data and therefore in the state. Also, by 5G I meant that every task manager should hold 5G of state, the actual state is bigger and distributed between all the taskmanagers. – JoeHills Aug 25 '22 at 13:59
  • What version of Flink are you using? – David Anderson Aug 26 '22 at 01:14
  • I'm using 1.14.4 – JoeHills Aug 27 '22 at 08:02

1 Answers1

2

I wonder whether there is a specific configuration to help with the memory consumption.

If you are able to obtain a heap profiling (like https://gperftools.github.io/gperftools/heapprofile.html ?), it will be helpful to figure out out what part of RocksDB consume the most memory.

Given your memory budget (i.e, expectation) you plan for your RocksDB, you might start with some general memory controls as following:

I am not clear on how missing keys can potentially affect your memory consumption in specific way though.

hx235
  • 56
  • 1
  • My intuition was that because I know that there can be a lot of key misses, I can "save" the access to the blocks by using bloom filters and therfore to minimize both cache and CPU consumption. (Because when the rocksdb will try to get a key that doesn't belong to the set, the bloom filter returns false the the corresponding block won't be accessed) – JoeHills Aug 27 '22 at 08:07