3

I have a link tracking table that has (amongst other fields) track_redirect and track_userid. I would like to output both the total count for a given link, and also the unique count - counting duplicates by the user id. So we can differentiate if someone has clicked the same link 5 times.

I've tried emitting this.track_userid in both the key and values parts but can't get to grips with how to correctly access them in the reduce function.

So if I roll back to when it actually worked, I have the very simple code below - just like it would be in a 'my first mapreduce function' example

map

function() {
  if(this.track_redirect) {
    emit(this.track_redirect,1); 
  }
}

reduce

function(k, vals) {
  var sum = 0;
  for (var i in vals) {
    sum += vals[i];
  } 
  return sum;
}

I'd like to know the correct way to emit the additional userid information and access it in the mapreduce please. or am i thinking about it in the wrong way?

in case it's not clear, I don't want to calculate the total clicks a userid has made, but to count the unique clicks of each url + userid - not counting any duplicate clicks a userid made on each link

can someone point me in the right direction please? thanks!

joevallender
  • 4,293
  • 3
  • 27
  • 35

1 Answers1

4

You can actually pass arbitrary object on the second parameter of the emit call. That means you can take advantage of this and store the userid in it. For example, your map function can look like this:

var mapFunc = function() {
  if (this.track_redirect) {
    var tempDoc = {};
    tempDoc[this.track_userid] = 1;

    emit(this.track_redirect, {
      users_clicked: tempDoc,
      total_clicks: 1
    });
  }
};

And your reduce function might look like this:

var reduceFunc = function(key, values) {
  var summary = {
    users_clicked: {},
    total_clicks: 0
  };

  values.forEach(function (doc) {
    summary.total_clicks += doc.total_clicks;
    // Merge the properties of 2 objects together
    // (and these are actually the userids)
    Object.extend(summary.users_clicked, doc.users_clicked);
  });

  return summary;
};

The users_clicked property of the summary object basically stores the id of every user as a property (since you can't have duplicate properties, you can guarantee that it will store unique users). Also note that you have to be careful of the fact that some of the values passed to the reduce function can be result of a previous reduce and the sample code above takes that into account. You can find more about the said behavior in the docs here.

In order to get the unique count, you can pass in the finalizer function that gets called when the reduce phase is completed:

var finalFunc = function(key, value) {
  // Counts the keys of an object. Taken from:
  // http://stackoverflow.com/questions/18912/how-to-find-keys-of-a-hash
  var countKeys = function(obj) {
    var count = 0;

    for(var i in obj) {
      if (obj.hasOwnProperty(i))
      {
        count++;
      }
    }

    return count;
  };

  return {
    redirect: key,
    total_clicks: value.total_clicks,
    unique_clicks: countKeys(value.users_clicked)
  };
};

Finally, you can execute the map reduce job like this (modify the out attribute to fit your needs):

db.users.mapReduce(mapFunc, reduceFunc, { finalize: finalFunc, out: { inline: 1 }});
Ren
  • 678
  • 4
  • 6
  • thanks Ren, that is exactly right. It's nice for the example to be using the data I am - as I actually understand it now! Very good first stackoverflow answer matey! – joevallender Feb 01 '12 at 09:46
  • If anyone stumbles across this thread, this might be useful: I've written up Ren's answer including showing the data between each stage http://scriptogr.am/joevallender/post/simple-introduction-to-mapreduce-using-mongodb – joevallender Jun 14 '12 at 12:53