MongoDB distinct too big 16mb cap

Question

I have a Mongodb collection. Simply, it has two columns: user and url. It has 39274590 rows. The key of this table is {user, url}.

Using Java, I try to list distinct urls:

  MongoDBManager db = new MongoDBManager( "Website", "UserLog" );
  return db.getDistinct("url");

But I receive an exception:

Exception in thread "main" com.mongodb.CommandResult$CommandFailure: command failed [distinct]: 
{ "serverUsed" : "localhost/127.0.0.1:27017" , "errmsg" : "exception: distinct too big, 16mb cap" , "code" : 10044 , "ok" : 0.0}

How can I solve this problem? Is there any plan B that can avoid this problem?

score 12 · Answer 1 · edited Jan 05 '18 at 17:39

12

In version 2.6 you can use the aggregate commands to produce a separate collection: http://docs.mongodb.org/manual/reference/operator/aggregation/out/

This will get around mongodb's limit of 16mb for most queries. You can read more about using the aggregation framework on large datasets in mongodb 2.6 here: http://vladmihalcea.com/mongodb-2-6-is-out/

To do a 'distinct' query with the aggregation framework, group by the field.

db.userlog.aggregate([{$group: {_id: '$url'} }]);

Note: I don't know how this works for the Java driver, good luck.

edited Jan 05 '18 at 17:39

Vlad Mihalcea

142,745
71
566
911

answered Dec 05 '14 at 19:27

Will Shaver

12,471
5
49
64

2

its giving me the list of user id how i can get the count of that – ak3191 Oct 12 '18 at 15:34

score 3 · Answer 2 · edited May 23 '17 at 11:45

Take a look at this answer

1) The easiest way to do this is via the aggregation framework. This takes two "$group" commands: the first one groups by distinct values, the second one counts all of the distinct values

2) If you want to do this with Map/Reduce you can. This is also a two-phase process: in the first phase we build a new collection with a list of every distinct value for the key. In the second we do a count() on the new collection.

Note that you cannot return the result of the map/reduce inline, because that will potentially overrun the 16MB document size limit. You can save the calculation in a collection and then count() the size of the collection, or you can get the number of results from the return value of mapReduce().

score 2 · Answer 3 · answered Dec 05 '16 at 13:29

If you are using mongodb 3.0 and above you can use DistinctIterable class with batchSize.

MongoCollection coll = null;
coll = mongodb.getCollection("mycollection");
DistinctIterable<String> ids = coll.distinct("id", String.class).batchSize(100);
for (String id: ids) {
    System.out.println("" + id);
}

http://api.mongodb.com/java/current/com/mongodb/client/DistinctIterable.html

Tommy Ng · Answer 4 · 2018-04-29T13:56:02.243

Version 3.x on Groovy :

import com.mongodb.client.AggregateIterable
import com.mongodb.client.MongoCollection
import com.mongodb.client.MongoCursor
import com.mongodb.client.MongoDatabase
import static com.mongodb.client.model.Accumulators.sum
import static com.mongodb.client.model.Aggregates.group
import static java.util.Arrays.asList
import org.bson.Document

//other code

AggregateIterable<Document> iterable = collection.aggregate(
    asList(
        group("\$" + "url", sum("count", 1))
    )
).allowDiskUse(true)

MongoCursor cursor = iterable.iterator()

while(cursor.hasNext()) {
    Document doc = cursor.next()
    println(doc.toJson())
}

MongoDB distinct too big 16mb cap

4 Answers4