11

I want to share a very large object e.g. in orders of megabytes or even several gigabytes, between a set of machines. The object will be written once but may be read many times. Maybe a naive approach is to use a ceneteralized storage like redis. However, it may become a single point of failure and too many requests may make a DOS attack on redis. Then, a distributed solution is much more promising. But, the main concern is replicating the structure to all machines. If the replication is done via a master/slave technique, then replication may result a huge traffic load on the master because the object is large. Therefore, a better solution is using a P2P strategy for replicating the object in order to decrease the network load on the master.

Does any body know a solution for this problem? Maybe some candidates are:
- Redis
- Memcached
- Voldemort
- Hazelcast

My major concerns are Java interface, sharing big object, high availablity, and low network traffic for replication.

Thanks beforehand.

Saeed Shahrivari
  • 815
  • 1
  • 9
  • 16

1 Answers1

16

Caching large objects in NoSQL stores is generally not a good idea, because it is expensive in term of memory and network bandwidth. I don't think NoSQL solutions shine when it comes to storing large objects. Redis, memcached, and most other key/value stores are clearly not designed for this.

If you want to store large objects in NoSQL products, you need to cut them in small pieces, and store the pieces as independent objects. This is the approach retained by 10gen for gridfs (which is part of the standard MongoDB distribution):

See http://docs.mongodb.org/manual/applications/gridfs/

To store large objects, I would rather look at distributed filesystems such as:

These systems are scalable, highly available, and provide both file and object interfaces (you probably need an object interface). You can also refer to the following SO question to choose a distributed filesystem.

Best distributed filesystem for commodity linux storage farm

Up to you to implement a cache on top of these scalable storage solutions.

Community
  • 1
  • 1
Didier Spezia
  • 70,911
  • 12
  • 189
  • 154
  • +1 for a very nice answer. Can you pls tell me how large is really a "large object". If my object size is approx 200k will that also be considered too large to be stored in a NoSQL (Document based) database? – anubhava Mar 21 '13 at 10:35
  • 3
    200k is probably fine if you make sure the socket buffers are large enough. There is a first performance gap when the object does not fit into an Ethernet packet (1.5k) and a second one when it does not fit into socket buffers (generally up to 256k) – Didier Spezia Mar 21 '13 at 11:59
  • Thanks for your timely response. I'm not networking expert, will sure check with my netops team about these i.e Ethernet packet size and `socket buffer size`. – anubhava Mar 21 '13 at 13:17