I see all kinds of reference to MongoDB as a client for the YCSB benchmarks to test NoSQL database server scalability / elasticity.
https://github.com/brianfrankcooper/YCSB
However, it is clear that the benchmark would require some kind of sharding setup, because the tests are designed to run on 6 to 10 server machines to show the scaling and elasticity.
I cannot find any reference on the internet for what that configuration looks like with MongoDB. I cannot find anyone who published results who also published what their configuration looks like.
Was this thing really done successfully? What are the results compared to the original YCSB clients like Cassandra, HBase, etc.
I am especially confused because, In the code of the MongoDB client it reads ..... "there is one DB instance per client thread" ...see snippet.
public class MongoDbClient extends DB {
private static final Logger logger = LoggerFactory.getLogger(MongoDbClient.class);
private Mongo mongo;
private WriteConcern writeConcern;
private String database;
/**
* Initialize any state for this DB. Called once per DB instance; there is
* one DB instance per client thread.
*/
public void init() throws DBException {
// initialize MongoDb driver
Properties props = getProperties();
......
However, in the Brian Cooper YCSB results paper, it states that they ran their workloads up to 500 threads.
6.1 Experimental Setup
For most experiments, we used six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet) to run each system. We also ran PNUTS on a 47 server cluster to successfully demonstrate that YCSB can be used to benchmark larger systems. PNUTS required two additional machines to serve as a configuration server and router, and HBase required an additional machine called the “master server.” These servers were lightly loaded, and the results we report here depend primarily on the capacity of the six storage servers. The YCSB Client ran on a separate 8 core machine. The Client was run with up to 500 threads, depending on the desired offered throughput. We observed in our tests that the client machine was not a bottleneck; in particular, the CPU was almost idle as most time was spent waiting for the database system to respond.
Does anyone know where there is a sharding configuration for this benchmark and are there any real results against the competition that can be backed up by a shard configuration or a detailed explaination of why a shard would not be necessary.
Thanks, -Robert