1

I am using java to connect to MongoDB. I want to fetch and count distinct values of two fields i.e requestId and telNum. I googled but didn't find how to fetch distinct values for two fields.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Sidd
  • 111
  • 1
  • 2
  • 8

3 Answers3

8

MongoDB has a Aggregation Framework and pipeline that is somewhat analogous to SQL "GROUP BY" but the phases can do much more advanced work. We will show a three phase example to get the count of distinct combinations that appear more than once.

Considering you mean distinct occurrences of the same value for requestId and telNum together in the document, and looking at this as the corresponding SQL

SELECT requestId, telNum, count(*) as counter
from collection
GROUP BY requestId, telNum

The mongo shell bit is to group on an _id key combined from the two values. Therefore:

db.collection.aggregate([
    {$group: { _id: { requestId: "$requestId", telNum: "$telNum" }, count: {$sum: 1} } }
])

So the gist of it in Java:

// Construct our _id to group on
DBObject fields = new BasicDBObject( "requestId", "$requestId" );
fields.put( "telNum", "$telNum" );

// Contruct group element
DBObject groupFields = new BasicDBObject( "_id", fields );
groupFields.put( "count", new BasicDBObject( "$sum", 1 ) );
DBObject group = new BasicDBObject( "$group", groupFields );



// Run aggregation
AggregationOutput output = collection.aggregate( group );

So the output here would match the SQL from above.

Taking this further let's consider the full SQL to give us the more than once distinct counts:

SELECT count(*) 
FROM ( 
  SELECT requestId, telNum, count(*) as counter
  FROM collection
  GROUP BY requestId, telNum
) a
WHERE a.counter > 1

So we can alter the code to add more stages to the aggregation pipeline to $match (WHERE/HAVING) and $group (GROUP BY) again:

// Construct a match on things with a count of more than 1
DBObject greaterThan = new BasicDBObject( "$gt", 1 );
DBObject matchFields = new BasicDBObject( "count", greaterTen );
DBObject match = new BasicDBObject( "$match", matchFields );

// Count the documents that match
DBObject newGroupFields = new BasicDBObject( "_id", null );
newGroupFields.put( "count", new BasicDBOject( "$sum", 1 ) );
DBObject group2 = new BasicDBObject( "$group", newGroupFields );

// Run aggregation
AggregationOutput output = collection.aggregate( group, match, group2 );

So this chains three phases in the chain

  1. Group the get a count of the distinct values for the fields
  2. Filter out the documents that had only 1 distinct value, so only leave the duplicates
  3. Count out the distinct pairs from the resulting from the filter

Aggregation allows you to 'chain' various phases like this together in order to get your results. It is very suited to this sort of work, and worth noting that it is considerably faster than using Map-Reduce, even though that does still have it's uses.

Edit and play with as required.

There is also a useful example to follow here

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • Working fine, but I want count of this output. Should I count with iterator ? – Sidd Jan 30 '14 at 11:21
  • Didn't see the comment before. Updated answer shows taking the phase further to reduce down to a count. – Neil Lunn Feb 01 '14 at 08:00
  • I don't have knowledge of Aggregation. Please help me to add one more constraint i.e date should be greater than something("2014-05-06") – Sidd Feb 06 '14 at 13:53
0
    import com.mongodb.MongoClient;
    import com.mongodb.MongoException;
    import com.mongodb.WriteConcern;
    import com.mongodb.DB;
    import com.mongodb.DBCollection;
    import com.mongodb.BasicDBObject;
    import com.mongodb.DBObject;
    import com.mongodb.DBCursor;
    import com.mongodb.ServerAddress;

    import java.util.Arrays;

    // To directly connect to a single MongoDB server (note that this will not auto-discover the primary even
    // if it's a member of a replica set:
    MongoClient mongoClient = new MongoClient();

    DB db = mongoClient.getDB( "mydb" );
    AggregationOutput output =db.CollectionName.mapReduce(
    function(){ emit(this.id, this.name);},
    function(key, value) { return Array.sum(1)},
    {
     query: {status: "A"},
     out: "uniq_id_name"
    }
    );
System.out.println(output);
Vikram Jain
  • 5,498
  • 1
  • 19
  • 31
0

Try this:

db.collection.aggregate( 
    { 
        $group: { 
            _id : "$the_field_to_be_grouped",  // enter the field you want to group by
            totalRequestId : { $sum : "$requestId" }, 
            totaltelNum: { $sum : "$telNum" }  
        } 
    }
);
heinob
  • 19,127
  • 5
  • 41
  • 61