Mongodb cursor.toArray() is too slow

Question

I am using cursor.toArray() to return my collection.find(query) as a list and response time for my API is in 100's of milliseconds. Data fetched into the cursor is very less (a couple of hundred records), the database is indexed on the field I am querying. I have also set the batch size cursor.batchSize(1000).

db.collection.find({"{ "ZIP" : { "$in" : [ "12345"]}}"}) is my query, and my databse is indexed on 'ZIP' . I can see the same query running on the shell within 4 ms.

The same query on mongo shell hardly takes 5 ms.

Mogo driver I am using is :

<!-- https://mvnrepository.com/artifact/org.mongojack/mongojack -->
<dependency>
    <groupId>org.mongojack</groupId>
    <artifactId>mongojack</artifactId>
    <version>2.8.2</version>
</dependency>

The code

@Path("/")
@Produces(MediaType.APPLICATION_JSON)
@Api(value = "listing-mongo")
public class MlsMongoResource {

    private JacksonDBCollection<Mlsdatadao, String> collection;
    Clock clock;

    public MlsMongoResource(JacksonDBCollection<Mlsdatadao, String> collection) {
        this.collection = collection;
        this.clock =  Clock.systemUTC();
    }

    @GET
    @Path("/listings-mongo")
    @Produces(value = MediaType.APPLICATION_JSON)
    @Timed
    public List<Mlsdatadao> getListings(@BeanParam MlsListingParameters mlsBeanParam) {

        BasicDBList basicDbList = new BasicDBList();
        mlsBeanParam.validateBean();

        setLocations(basicDbList,mlsBeanParam.zipcodes);

        BasicDBObject query = new BasicDBObject("$and", basicDbList);

        DBCursor<Mlsdatadao> cursor = null;

        long start = 0;
        try{
             start = System.currentTimeMillis();
            cursor = collection.find(query);
            cursor.batchSize(1000);

        } catch (Exception e){
            System.out.println("IN collection.find() " +  e.getCause());
        }


        System.out.println("QUERY LIST IS " + basicDbList);

        if(cursor == null) {
            System.out.println("Cursor is null");
        }
        List<Mlsdatadao> result = cursor.toArray();
       cursor.close();(System.currentTimeMillis() - start));
        return result;

    }



    private void setLocations(BasicDBList basicDbList, List<String> zipcodes) {

        if (CollectionUtils.isNotEmpty(zipcodes)) {
            basicDbList.add(setZipcodes(zipcodes));
        }

    }

    private BasicDBObject setZipcodes(List<String> zipcodes) {
        return new BasicDBObject("ZIP" ,  new BasicDBObject("$in", zipcodes) );
    }
}

Application:

public class MongoApplication extends Application <MlsMongoConfiguration> {

    public static void main(String[] args) throws Exception {
        new MlsMongoApplication().run(args);
    }

    @Override
    public String getName() {
        return "mls-dropwizard-mongo";
    }

    @Override
    public void initialize(Bootstrap<MlsMongoConfiguration> bootstrap) {

        bootstrap.addBundle(new SwaggerBundle<MlsMongoConfiguration>() {
            @Override
            protected SwaggerBundleConfiguration getSwaggerBundleConfiguration(MlsMongoConfiguration configuration) {
                return configuration.swaggerBundleConfiguration;
            }
        });

    }

    @Override
    public void run(MlsMongoConfiguration configuration, Environment environment) throws Exception {


        MongoClientOptions.Builder clientOptions = new MongoClientOptions.Builder();
        clientOptions.minConnectionsPerHost(1000);//min
        clientOptions.maxWaitTime(1000);
        clientOptions.connectionsPerHost(1000);

        //Create Mongo instance
        //Mongo mongo = new Mongo(configuration.mongohost, configuration.mongoport);

        MongoClient mongoClient = new MongoClient(new ServerAddress(configuration.mongohost, configuration.mongoport), clientOptions.build());

        //Add Managed for managing the Mongo instance
        //MongoManaged mongoManaged = new MongoManaged(mongo);
        MongoManaged mongoManaged = new MongoManaged(mongoClient);
        environment.lifecycle().manage(mongoManaged);

        //Add Health check for Mongo instance. This will be used from the Health check admin page
        environment.healthChecks().register("MongoHealthCheck", new MongoHealthCheck(mongoClient));
        //Create DB instance and wrap it in a Jackson DB collection
        DB db = mongoClient.getDB(configuration.mongodb);
        JacksonDBCollection<Mlsdatadao, String> jacksonDBCollection = JacksonDBCollection.wrap(db.getCollection("mlsdata"), Mlsdatadao.class, String.class);
        environment.jersey().register(new MlsMongoResource(jacksonDBCollection));
    }
}

Is there any way to avoid cursor.toArray()? Any other performance tuning hints would be really helpful.

Thanks.

`batchSize` does not do what you think it does, and really won't have an effect here. A couple of hundred records is not really a stretch so the bottleneck here is more likely the "query" being issued, or if there is "no query" and you simply ask for everything in the collection then it's most likely an infrastructure problem ( slow network / misconfigured / server capability ). If it "is" a query being issued, then you really should be showing what you are trying to do and then people can help you "optimize" it. Problems need information for others to solve. — Neil Lunn, May 09 '18 at 01:38
As I mentioned, it is a basic select on a column, which is indexed. `db.collection.find({"{ "ZIP" : { "$in" : [ "12345"]}}"})` and my databse is indexed on 'ZIP' . I can see the same query running on the shell within 5 ms. Can you elobarate on what `batchSize` does exactly ? — sudarshan kakumanu, May 09 '18 at 01:47
The query in the shell will only get the first 20 docs, by default. So your two cases are not equivalent. — JohnnyHK, May 09 '18 at 01:51
well, 5ms is the number I got through database profiler. Does 20 docs default stay true in this case too ? — sudarshan kakumanu, May 09 '18 at 01:54
`db.collection.find({ "ZIP": "12345" }).toArray()` and see what it does since that's the same thing. As stated already a "couple of hundred" should not be an issue. If you still think there's an issue then show the "actual Java code" so we can see what you are really doing. — Neil Lunn, May 09 '18 at 01:54
I tried `db.collection.find({ "ZIP": "12345" }).toArray()` on the mongo shell, and performance was great, I have posted the java code, please have a look. — sudarshan kakumanu, May 09 '18 at 03:21
That's not the same query. I can see clearly by the parameter being send into the the `setLocations()` method that there appears to be lists of values for different fields. That's vastly different to matching on a single value. Take a look at the actual query being issued and run through some explain results. Also this "smells of" issuing lists of hundreds of different values into the query itself, and that's always bad for performance. — Neil Lunn, May 09 '18 at 03:53
I have removed that part deliberately, that's proprietary, at a given time only one of the parameter will have values. (for now, I am only querying using zip codes), assume that they other params are not used. Also is it a good idea to try with a different library ? Or do you think cursor.toArray() is a good way ? — sudarshan kakumanu, May 09 '18 at 05:03
I would suggest you compare what's in the MongoDB logs for those two queries (turn it on through `db.setLogLevel(1)`). Issuing your query through a MongoDB shell should show an identical query in the log as the one from your program and return after the same timeIn your code, you're also measuring the `.toArray()` call (array allocation + BSON parsing + instantiation of all the `Mlsdatadao`instances), the try-catch, the `System.out.println`, the `if`, the network and more... — dnickless, May 09 '18 at 05:28
@dnickless As you suggested I ran queries both from mongo shell and rest end point. `"MongoDB Shell" command: find { find: "mlsdata", filter: { ZIP: { $in: [ "32803" ] } } } planSummary: IXSCAN { ZIP: 1 } cursorid:19809914739 keysExamined:101 docsExamined:101 numYields:2 nreturned:101 reslen:644989 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, Collection: { acquireCount: { r: 3 } } } protocol:op_command 4ms` — sudarshan kakumanu, May 09 '18 at 17:09
@dnickless rest end point log : { find: "mlsdata", filter: { $and: [ { ZIP: { $in: [ "32803" ] } } ] }, batchSize: 1000 } planSummary: IXSCAN { ZIP: 1 } keysExamined:133 docsExamined:133 cursorExhausted:1 numYields:1 nreturned:133 reslen:846498 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 2ms doesn't seem to be slow. Is there a possibility that the result is cached? NEvertheless, it is only taking 7ms for a different zip with 203 docs examined. — sudarshan kakumanu, May 09 '18 at 17:24

score 1 · Accepted Answer · answered May 14 '18 at 19:05

1

Things look good after changing my MongoDB driver from mongojack to native mongo-java-driver 3.7 and used com.mongodb.client.FindIterable instead of DBCursor. Looks like Mongojack library is spending a lot of time in mapping BSON objects to POJOs.

answered May 14 '18 at 19:05

sudarshan kakumanu

308
1
4
15

Mongodb cursor.toArray() is too slow

1 Answers1