0

When I load a collection into memory for the first time, it is all in the memory (i can see it in the task manager), but over time I can see that only part of the original size is taken by arangod process. Besides, when I execute a query, retrieving data from that collection, I can see that disk usage is growing for a short period of time and the size of used RAM is growing aswell.

I'd like to avoid it. How can I do it? I see that collections have the property isVolatile

isVolatile: If true then the collection data will be kept in memory only and ArangoDB will not write or sync the data to disk.

it is almost what I want but

Unloading the collection will cause the collection data to be discarded. Stopping or re-starting the server will also cause full loss of data in the collection

Can I somehow keep the whole collection in memory but without losing data after unloading?

elfinorr
  • 189
  • 3
  • 12

1 Answers1

1

The only way to guarantee that your collections are in RAM is to use the MMFiles engine. With RocksDB there is no guarantee. Two full collection scans should also lead to RocksDB collections to be loaded to RAM. But when you deplete your memory, some data is unloaded again.

Just because memory figures go back it is not an indication of collection data being unloaded. Here's the Wikipedia article on MMFs: https://en.wikipedia.org/wiki/Memory-mapped_file. So as long as your collection is loaded, which happens immediately when you access it's data or specifically call the load method, it is residing in RAM.

Regarding your question about data loss: you have 2 different strategies for syncing data to disk, which you can choose from: wait-for-sync true or false. This parameter can be set at startup - then affecting all databases and all collections - or on a per collection basis when you initially create them. As the name says it refers to the point at which a data point is considered committed and reported as such to the client. For high performance and less safety the value could be set false. Under this regime one may lose a couple of seconds of data, should power to the machine or disks suddenly fail.

TLDR use MMFiles and your loaded collections live in RAM, as long as you have memory left. Beyond that point you end up in swap space with horrendous consequences for performance.

Kaveh Vahedipour
  • 3,412
  • 1
  • 14
  • 22
  • Thank you for your answer. I'm sorry, but are you sure about that? I'm currently using `mmfiles` but as I explained above it looks like ArandoDB doesnt keep **the whole collection in RAM all the time**. Besides, there is an explanation that shows that pages could be swapped out from the memory [here](https://stackoverflow.com/questions/24380071/memory-usage-of-arangodb?rq=1). – elfinorr Aug 04 '18 at 22:30
  • For example I started the machine 12 hours ago and worked with DB for 2-3 hours. I saw that memory usage of `arangod` was about 12GB. Then I haven't work with the machine for 9 hours and now I see that `arangod` uses only 7GB. – elfinorr Aug 04 '18 at 22:58
  • @elfinorr I updated my answer. I am an arangodb senior developer, so the answer comes close to the truth – Kaveh Vahedipour Aug 05 '18 at 11:46
  • Well, it is pretty obvious that I didn't understand what I was working with Once again, thank you! – elfinorr Aug 05 '18 at 14:01