0

With regard to the Key/Value model of ArangoDB, does anyone know the maximum size per Value? I have spent hours searching the Internet for this information but to no avail; you would think that this is a classified information. Thanks in advance.

ben.jamin
  • 217
  • 1
  • 3
  • 6

1 Answers1

0

The answer depends on different things, like the storage engine and whether you mean theoretical or practical limit.

In case of MMFiles, the maximum document size is determined by the startup option wal.logfile-size if wal.allow-oversize-entries is turned off. If it's on, then there's no immediate limit.

In case of RocksDB, it might be limited by some of the server startup options such as rocksdb.intermediate-commit-size, rocksdb.write-buffer-size, rocksdb.total-write-buffer-size or rocksdb.max-transaction-size.

When using arangoimport to import a 1GB JSON document, you will run into the default batch-size limit. You can increase it, but appears to max out at 805306368 bytes (0.75GB). The HTTP API seems to have the same limitation (/_api/cursor with bindVars).

What you should keep in mind: mutating the document is potentially a slow operation because of the append-only nature of the storage layer. In other words, a new copy of the document with a new revision number is persisted and the old revision will be compacted away some time later (I'm not familiar with all the technical details, but I think this is fair to say). For a 500MB document is seems to take a few seconds to update or copy it using RocksDB on a rather strong system. It's much better to have many but small documents.

CodeManX
  • 11,159
  • 5
  • 49
  • 70
  • Thanks. I know that ArangoDb uses RocksDB, and that MMFiles storage engine is deprecated starting with version 3.6.0. I also know that the maximum size per Value in ArangoDB Document model is 75 kB. ALL I want to know is the maximum size per Value in the ArangoDB Key/Value model. – ben.jamin May 23 '20 at 23:33
  • There is no distinct key/value model, it is subsumed by the document model. Every document has a unique `_key`, by which it can be accessed. The key is required, immutable, but can be user-defined on document creation. ArangoDB can thus be used as a key-value store, where the entire document represents the value which can be fetched by key. In most relational systems on the hand, a primary key is optional and can be defined over multiple fields. Anyway, the "value" can be as large as the largest supported document because everything is JSON document based. – CodeManX May 25 '20 at 12:39
  • The practical limit for document size appears to be 768MB because of the maximum batch size on the transport layer. However, with arangosh I successfully saved a 1023MB document with a single string attribute, so the storage layer would support more than the batch size. 1GB is V8's string size limit, so I can't test more than that. Not sure if the document API is also capped around 768MB but it seems so using Curl (immediate empty server reply). – CodeManX May 25 '20 at 12:43
  • I know that you are trying to help, but I have read a lot about Arangodb, RocksDB and MMFiles; and lots of your response here is not unknown albeit not relevant to my question. I could correct some of the inaccuracies in your response, but want to avoid off-topic and distractions from my simple question, namely, what is the maximum size per Value in the ArangoDB Key/Value model. And yes, there is indeed a distinct key/value model in ArangoDB; what you describe is its underlying implementation. – ben.jamin May 25 '20 at 16:49
  • No, there is no distinct key/value model. As @CodeManX correctly explained it is subsumed by the document model. Every document has an implicit `_key` attribute (which obviously is the _key_ in the key/value model) and the document itself is the _value_. That's all there is to it. Arango internally uses VelocyPack to serialize the data, and this format can store 64bit values as size indicator for the various types. However, AFAIK the max. _document_ size is 2GB. That said, large documents _really_ hurt performance! – mpoeter May 26 '20 at 09:35
  • ArangoDB is a multi-model database with capabilities for: [1] graph model, [2] document model, and [3] KEY-VALUE model. And please ENOUGH of the distractions from the question. You are not helping; and this is not a debate about ArangoDB data model and performance. – ben.jamin May 26 '20 at 10:25
  • >>AFAIK the max. document size is 2GB. I need facts, e.g., Arangodb documentation to verify. – ben.jamin May 26 '20 at 10:31
  • Yes, it is multi-model in the sense that you can use it for those different purposes, but it is still important that you can only store _JSON documents_ - i.e., the _value_ in your key/value model is the JSON document itself. In this [github issue](https://github.com/arangodb/arangodb/issues/10754) jsteemann (one of the Arango developers) states: "the RocksDB library also has some limitations for string lengths (for keys and values), and I guess the limit there is actually lower (2 GB??)." Important: here the "value" is again your _complete JSON document_ (although encoded as VelocyPack). – mpoeter May 27 '20 at 10:46
  • I have seen the 'github issue' earlier but thanks. The exact answer to my question seems hard to find, so rather than taking the risk, I shall switch to Cassandra. Given ArangoDB multi-model, and since I am already using the ArangoDB Document model, I wanted to use the Key/Value model to store blob of (say) 2GB per value -- of course, by first splitting it into (say) 1-100MB chunks depending on latency and performance. I think that Cassandra can fulfil my use case much better, at least, until I know more about the ArangoDB Key/Value model. – ben.jamin May 27 '20 at 13:42
  • Yes, if you want to store (large) binary data Arango is certainly not a good fit. For a use case like that I would suggest an object storage like [MinIO](https://min.io/) or Amazon S3. – mpoeter May 28 '20 at 12:00