Downsides of storing binary data in Riak?

Question

What are the problems, if any, of storing binary data in Riak?

Does it effect the maintainability and performance of the clustering?

What would the performance differences be between using Riak for this rather than a distributed file system?

score 12 · Answer 1 · edited May 23 '17 at 12:33

Adding to @Oscar-Godson's excellent answer, you're likely to experience problems with values much larger than 50MBs. Bitcask is best suited for values that are up to a few KBs. If you're storing large values, you may want to consider alternative storage backends, such as innostore.

I don't have experience with storing binary values, but we've a medium-sized cluster in production (5 nodes, on the order of 100M values, 10's of TBs) and we're seeing frequent errors related to inserting and retrieving values that are 100's of KBs in size. Performance in this case is inconsistent - some times it works, others it doesn't - so if you're going to test, test at scale.

We're also seeing problems with large values when running map-reduce queries - they simply time out. However that may be less relevant to binary values... (as @Matt-Ranney mentioned).

Also see @Stephen-C's answer here

Hi Ben - you shouldn't have any performance issues with objects less than 1MiB in size. If you'd like to provide information about your cluster on the riak-users mailing list, one of us Bashos can help diagnose. Be sure you've tuned your system according to the recommendations in the docs. — Luke Bakken, Apr 19 '16 at 15:57

score 7 · Answer 2 · edited Apr 18 '16 at 16:16

7

The only problem I can think of is storing binary data larger than 50MBs which they advise against. The whole point of Riak is just that:

Another reason one might pick Riak is for flexibility in modeling your data. Riak will store any data you tell it to in a content-agnostic way — it does not enforce tables, columns, or referential integrity. This means you can store binary files right alongside more programmer-transparent formats like JSON or XML.

Source: Schema Design in Riak - Introduction

edited Apr 18 '16 at 16:16

Vishesh Handa

1,830
2
14
17

answered May 23 '11 at 21:06

Oscar Godson

31,662
41
121
201

1

Note that the ~50MB issue is caused by hardcoded size limits on Erlang's network distribution buffers, not by anything in Riak. – seancribbs May 23 '11 at 22:28
2

Is there a reason why its the default for Erlang? Just "best practice", or does it cause performance issues? – Oscar Godson May 23 '11 at 22:33

score 4 · Answer 3 · answered Nov 16 '13 at 02:05

With Riak, the recommended maximum is 2MB per object. Above that, it's recommended to use either Riak CS, which has been tested with objects up to 5TB (Stored in Riak as 1MB objects) or by naturally breaking up your large object into 2MB chunks and linking by a key and suffix.

score 3 · Answer 4 · answered May 23 '11 at 21:17

3

I personally haven't noticed any issues storing data such as images and documents (both DOC and PDF) into Riak. I don't have performance numbers but might be able to gather some should I remember.

Something of note, with Riak you can use Luwak which provides an api for storing large files. This has been pretty useful.

answered May 23 '11 at 21:17

Nick Campbell

537
2
5

does luwak distribute along with the rest of the data in Riak? – mikeal May 23 '11 at 21:28
also, is luwak exposed via the HTTP API? all i'm seeing is an erlang API. – mikeal May 23 '11 at 21:32
1

AFAIK Luwak is just a layer on top of Riak, it handles the data chunking for you. Everything else is normal business for Riak. – Nick Campbell May 23 '11 at 21:33
1

I use Luwak through [RiakJS](http://riakjs.org/) which is only supported (at least in RiakJS) through the HTTP API. – Nick Campbell May 23 '11 at 21:36
1

Luwak has an HTTP API, and it distributes the data across the cluster. http://wiki.basho.com/Luwak.html – Matt Ranney May 23 '11 at 21:58

score 1 · Answer 5 · answered May 23 '11 at 21:18

1

One problem may be that it is difficult, if not impossible, to use JavaScript map/reduce across your binary data. You'll probably need Erlang for that.

answered May 23 '11 at 21:18

Matt Ranney

1,638
12
12

1

in CouchDB, there is a separate API for storing binary data (attachments) specifically to handle this case. only the metadata about attachments make it to map/reduce. – mikeal May 23 '11 at 21:27
Check out Riak links. You can have one object that's the metadata suitable for m/r, and then add a link to the optional binary object. – Matt Ranney May 23 '11 at 22:00
How can the links keep the other linked object out of the map/reduce? – mikeal May 23 '11 at 22:22
You can filter which links you're including based on tagging- see [http://wiki.basho.com/Links-and-Link-Walking.html](http://wiki.basho.com/Links-and-Link-Walking.html) – Nick Campbell May 24 '11 at 13:54

Downsides of storing binary data in Riak?

5 Answers5

Linked