0

I have a mongo DB query which returns 10000+ records. To the business service I want to return the records in GSON (google JSON) format. Following is the code snippet.

        String mongoClientURI = null;
        mongoClientURI = "mongodb://" + dbUser + ":" + pwd + "@" + host + ":" + port + "/" + databaseName;


        MongoClient client = new MongoClient(new MongoClientURI(mongoClientURI));
        MongoDatabase db = client.getDatabase(databaseName);

        // Find query returns more than 10K records
        FindIterable<Document> dbResult = db.getCollection("mycollection").find();  

        // This line takes too much time & CPU      
        List<Document> result = getDocumentArray(dbResult);

        // This line takes too much time & CPU
        JsonArray finalResult = getJSONArray(result);

    public static List<Document> getDocumentArray(FindIterable<Document> input) {
        List<Document> output = new ArrayList<Document>();
        for (Document doc : input) {
            output.add(doc);
        }
        return output;
    }

    public static JsonArray getJSONArray(Iterable<Document> docs) {
        JsonArray result = new JsonArray();
        if (docs == null) {
            return result;
        }

        for (Document doc : docs) {
            JsonObject jsonObject;
            JsonParser parser = new JsonParser();
            try {
                jsonObject = parser.parse(doc.toJson()).getAsJsonObject();
                result.add(jsonObject);
            } catch (Exception e) {
                System.out.println(e.getMessage());
            }
        }
        return result;
    }

Problem is it takes lot of time and CPU cycles while fetching and doing convesion of data from MongoDB to GSON array. Can anybody please tell me what is the effective way to fetch large dataset from mongodb into gson format?

Generally I get around 10k to 20K records, (50K max). I need to return all of them. My UI component requires all the records in one go to render the data. Generally people use data grid with paging, hence it does not have too many records in one go but in my case I have map component which takes all the records & creates a cluster of objects on the fly.

Any help would be much appreciated.

Atul Sureka

Atul Sureka
  • 3,085
  • 7
  • 39
  • 64
  • If the selection of data is actually that big then your would be writing to a stream instead of any type of in-memory object representation. Your query of course is simply asking to dump everything in the database collection. Use streams and don't load in memory. – Neil Lunn Jun 20 '17 at 07:02

2 Answers2

1

Effective way would be not to return all the documents in one go and use skip() and limit() functions instead to implement paging. If your collection has let's say a million documents, you will end up with a large List in the memory and you might not even need to display all the documents on UI in one go.

Here's how you can implement paging:

1st Call : db.getCollection("mycollection").find().limit(10);
2nd Call : db.getCollection("mycollection").find().skip(10).limit(10);
3rd Call : db.getCollection("mycollection").find().skip(20).limit(10);

skip drives the page number whereas limit drives items per page. You can use sort as well, to define the ordering (here's the documentation).

Another approach would be to use a queueing mechanism or streaming to scroll through all the documents in cursor and push to queue/client.

Update

This is what you can do to make the existing implementation faster:

  • Convert Iterable to parallelStream to process the documents faster (have a look at this SO answer re how to convert Iterable to stream)
  • Don't create JsonParser instance for each document. Create an instance once and use it for all the documents.
  • Increase Xmx value for your application
Darshan Mehta
  • 30,102
  • 11
  • 68
  • 102
  • Thanks for your response, I understand paging but I need all records in one go - I do not have data table, instead i need to show data on map. all data needs to be passed to map component which we create the cluster on the client. – Atul Sureka Jun 20 '17 at 07:11
  • Well, UI can make multiple calls to your service with different paging parameters to get all the records. If you want to display 10000 records then you can execute 10 requests in parallel and return. If not, another way would be to select only required fields in `find` instead of dumping the whole `json` document to UI. – Darshan Mehta Jun 20 '17 at 07:14
  • I will try that but do you think JOLT https://github.com/bazaarvoice/jolt may solve my problem ?? – Atul Sureka Jun 20 '17 at 07:29
  • I'd rather recommend using [Jackson](https://github.com/FasterXML/jackson-docs), it's light weight and uses `streaming` for serialization/deserialization. – Darshan Mehta Jun 20 '17 at 07:32
0

Have you add indexing on columns in the mongoDB? It help to fetch/read the queries faster. The default index is on _id column.

Another way is for you to find based on certain criteria, try to split this into 2

db.getCollection("mycollection").find({ dept: "grocery", category: "chocolate"  }); 
db.getCollection("mycollection").find({ dept: "sales", category: "honey"  }); 

and combine this result later

Aza Suhaza
  • 220
  • 1
  • 2
  • 15