Riak CS implements the S3 API for an underlying Riak distributed, decentralized data storage system. Not only the Riak data storage system but also Riak CS must be deployed on each node. A further component called Stanchion must be deployed on a single node inside the cluster to keep user IDs and bucket names unique inside the cluster.
- Why is Riak CS deployed on all nodes of the cluster and not just on a single one?
- Do the Riak CS nodes communicate with each other? In the
riak-cs.conf
I cannot see any hint for this. Inside this configuration file, only the address of the local host and the address of the Stanchion instance is specified. As far as I understand it, each Riak CS instance interacts only with its underlying Riak data storage system listener and with the Stanchion instance of the cluster. - Is the idea of having Riak CS deployed on all nodes just for having multiple entry points to the S3 service? For improving scalability and availability? Wouldn't it be better for small clusters just to have a single Riak CS instance running as a single S3 API entry point? Or is the idea that one should deploy a load balancer before the (many?) Riak CS instances to distribute the incoming requests?