I have a huge collection with 3 billion documents. Each document looks like the following:
"_id" : ObjectId("54c1a013715faf2cc0047c77"),
"service_type" : "JE",
"receiver_id" : NumberLong("865438083645"),
"time" : ISODate("2012-12-05T23:07:36Z"),
"duration" : 24,
"service_description" : "NQ",
"receiver_cell_id" : null,
"location_id" : "658_55525",
"caller_id" : NumberLong("475035504705")
I would like to get the list of distinct users (they should at least appear once as a caller 'caller_id'), their counts (how many times each user appeared in the collection as either caller or receiver) and the count of locations if they are callers (i.e., the count for each location_id per user).
I want to end up with the following:
"number_of_records" : 20,
"locations" : [{location_id: 658_55525, count:5}, {location_id: 840_5425, count:15}],
"user" : NumberLong("475035504705")
I tried the solution described here and here but they are not efficient enough (extremely slow). What would be an efficient way to achieve this?