1

I have a big collection of event documents in a MongoDB collection. Fields in each document include:

  • userId (database id, if user is logged in)
  • visitorId (cookie id, always exists)
  • eventType (the type of event the user has just done)

One eventType in particular is of interest - the "purchase" type. What I'm trying to do is run a query which basically says:

Get a list of all the purchase events by users who performed a particular event.

Originally my idea was to simply download a list of all the users who performed an event, and then use that as a massive query condition. However this feels ugly, and I'd imagine for larger queries, it'd be quite expensive on the memory front.

Is there a better/smarter way?

Chris Houghton
  • 707
  • 6
  • 17
  • This looks like a perfect candidate for using the aggregation framework. – chridam Apr 02 '15 at 19:34
  • @chridam I hoped that you'd say that - how though? I'm assuming multiple `$match` steps? I think I get the principle, but would love a code example! Thanks – Chris Houghton Apr 02 '15 at 19:40
  • On second thoughts, hmmmmm, a single aggregation pipeline will not give you the best result. On the other hand, your original idea seems plausible, though expensive performance-wise as you will have to do some loops. Tempted to do a `db.collection.find({eventType: "some event"}).toArray(function (docs){// loop through the docs array, for each element do another find query on purchase events and add those events to a list })` – chridam Apr 02 '15 at 19:54
  • Updated the title to clear things up there – Chris Houghton Apr 02 '15 at 20:02

2 Answers2

2

I highly doubt there would be any gain by using aggregation framework. Also it is not clear to me that you can do what you want in one query (I think that it is not possible). On the other hand, I believe that your starting approach is the best one. Here what I would do:

You have your collection:

db.events.insert({userId: 1, event: 1})
db.events.insert({userId: 2, event: 1})
db.events.insert({userId: 3, event: 4})
db.events.insert({userId: 1, event: 4})
db.events.insert({userId: 3, event: 2})
db.events.insert({userId: 1, event: 3})
db.events.insert({userId: 3, event: 1})

Get all unique users who have done some event:

var usersSet = {};
db.events.find({event: 1}, {userId: 1, _id:0}).forEach(function(el){
  return usersSet[el.userId] = 1;
});

Note, that because mongoshell does not have a normal set datatype (if you are using js/node.js it finally has), I am storing it in object and later if your ids are integers, they will lose their type. This is why I translate them back with map(Number) later.

Having your users, you can do similar things with events. If you need to have all events (not unique as I am doing) just change eventSet to eventArray and push in the return function.

var eventsSet = {}    
db.events.find({
   userId: {$in: Object.keys(usersSet).map(Number)}
}, {
  event: 1, _id:0
}).forEach(function(el){
  return eventsSet[el.event] = 1;
});

this Object.keys(eventsSet).map(Number) will contain all unique events.

Community
  • 1
  • 1
Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
1

I would say this kind of queries are not well suited for mongodb. it can be done is different ways but none of them gives good performance for really big data sets.

Its an idea that you may have another collection which keeps user and its events occurrence.

{
    userid : "_user_id",
    events : ["event_type1", "event_type2"]
}

and you will need to keep this collection up to date(ie. upsert for every event)

or you can keep event count number for future use like :

{
    userid : "_user_id",
    event_type_1 : 5,
    event_type_2 : 16,
    event_type_3 : 2,       
}

and you can query on both existence and occurence count. For the first option you will need to push new event type to events array for a particular user and for the second option you would auto increment event type occurrence.

Mustafa Genç
  • 2,569
  • 19
  • 34