2

I see all kinds of reference to MongoDB as a client for the YCSB benchmarks to test NoSQL database server scalability / elasticity.

https://github.com/brianfrankcooper/YCSB

However, it is clear that the benchmark would require some kind of sharding setup, because the tests are designed to run on 6 to 10 server machines to show the scaling and elasticity.

I cannot find any reference on the internet for what that configuration looks like with MongoDB. I cannot find anyone who published results who also published what their configuration looks like.

Was this thing really done successfully? What are the results compared to the original YCSB clients like Cassandra, HBase, etc.

I am especially confused because, In the code of the MongoDB client it reads ..... "there is one DB instance per client thread" ...see snippet.

public class MongoDbClient extends DB {

    private static final Logger logger = LoggerFactory.getLogger(MongoDbClient.class);

    private Mongo mongo;
    private WriteConcern writeConcern;
    private String database;

    /**
     * Initialize any state for this DB. Called once per DB instance; there is
     * one DB instance per client thread.
     */
    public void init() throws DBException {
        // initialize MongoDb driver
        Properties props = getProperties();
        ......

However, in the Brian Cooper YCSB results paper, it states that they ran their workloads up to 500 threads.

6.1 Experimental Setup

For most experiments, we used six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet) to run each system. We also ran PNUTS on a 47 server cluster to successfully demonstrate that YCSB can be used to benchmark larger systems. PNUTS required two additional machines to serve as a configuration server and router, and HBase required an additional machine called the “master server.” These servers were lightly loaded, and the results we report here depend primarily on the capacity of the six storage servers. The YCSB Client ran on a separate 8 core machine. The Client was run with up to 500 threads, depending on the desired offered throughput. We observed in our tests that the client machine was not a bottleneck; in particular, the CPU was almost idle as most time was spent waiting for the database system to respond.

Does anyone know where there is a sharding configuration for this benchmark and are there any real results against the competition that can be backed up by a shard configuration or a detailed explaination of why a shard would not be necessary.

Thanks, -Robert

Community
  • 1
  • 1
Robert
  • 368
  • 2
  • 8
  • Common confusion / Misconception point: the Mongo class represents a connection pool rather than an individual connection. That would explain some of the thread per client questioning. – Brendan W. McAdams May 08 '11 at 20:29
  • Thanks for the info Brendan. Can you explain how that pertains to this case where YCSB is supposed to be connecting with 6 databases on 6 different machines? This question is still unanswered. – Robert May 09 '11 at 16:09
  • Unfortunately, I'm not familiar with YCSB. It is odd that they would setup a benchmark but provide no sample config or docs. It is entirely possible however that this setup is based on Replica Sets rather than Sharding. – Brendan W. McAdams May 11 '11 at 08:34
  • It is unfortunate, but the fact that nobody is able to point to a proper configuration now for nearly a week, leads me to believe that the MongoDB Client implementation for the YCSB does not really fully deliver the benchmark ...... It is quite easy to run YCSB workloads on a single machine for almost any database, the real goal of YCSB is a "scalability" test. Seems MongoDB is not able to do this one well enough to show up with Cassandra, Hbase and the others. If someone has a reference to a proper distribtued "scalability" configuration for this, please correct me. – Robert May 13 '11 at 08:39

1 Answers1

3

We did not include MongoDB as part of our initial YCSB study. The Mongo client was contributed later by another developer, but I haven't run the full benchmark against Mongo so I don't know whether the client really does everything it needs to. If it doesn't, go ahead and submit a patch and I'll try to include it!

Also, the "one DB instance per client thread" comment means one instance of the DB client class in the JVM, not necessarily one MongoDB server.

  • Hi Brian. Thanks for verifying. It seems with regard to Mongo, nobody has actually run the suite of tests as described in your paper ( YCSB is great work btw ). I have our folks looking into doing it. I was surprised to find out just how difficult it is to setup a distributed Mongo db. If we can get a sharded environment that works, we will publish. – Robert Jun 01 '11 at 23:42
  • Hi Brian, So is support of YCSB present for mongodb with sharding? We are benchmarking Mongo with ycsb. we have deployed mongo cluster with 3 shards but when we tried to load data using ycsb, it shows data loaded on console but when we checked in DB it was empty. Can you please clarify? Thanks in advance. – Nachiket Kate Feb 25 '15 at 13:37