6

I store our web server logs in MongoDB and the schema looks similar to as follows:

[
  {  
    "_id" : 12345,
    "url" : "http://www.mydomain.com/xyz/abc.html",
    ....
  },
  ....
]

I am trying to use the $project operator to reshape this schema a little bit before I start passing my collection through an aggregation pipeline. Basically, I need to add a new field called "type" that will later be used to perform group-by. The logic for the new field is pretty simple.

if "url" contains "pattern_A" then set "type" = "sales lead";
else if "url" contains "pattern_B" then set "type" = "existing client";
...

I'm thinking it would have to be something like this:

db.weblog.aggregate(
  { 
    $project : {
      type : { /* how to implement the logic??? */ }
    }
  }
);

I know how to do this using map-reduce (by setting the "keyf" attribute to a custom JS function that implements the above logic) but am now trying to use the new aggregation framework to do this. I tried to implement the logic using the expression operators but so far couldn't get it to work. Any help/suggestion would be greatly appreciated!

mata
  • 67,110
  • 10
  • 163
  • 162
Edenbauer
  • 1,306
  • 1
  • 11
  • 15

2 Answers2

1

I am sharing my "solution" in case others encounter the same needs like mine.

After researching for a couple of weeks, as @asya-kamsky suggested in one of his comments, I've decided to add a computed field to my original MongoDB schema. It's not ideal because whenever the logic for the computed field changes I would have to do bulk updates to update all documents in my collection but it was either that or rewrite my code to use MapReduce. I chose the former for now. In looking at MongoDB Jira board, it would appear that many people have asked for more diverse operators to be added for the $project operator and I certainly hope that the MongoDB dev team gets around to adding them sooner than later

Operator for splitting string based on a separator.

New projection operator $elemMatch

Allow $slice operator in $project

add a $inOrder operator to $project

Edenbauer
  • 1,306
  • 1
  • 11
  • 15
0

You need to use combination of several operators and expressions.

first, the $cond operator in $project lets you implement if then else logic.

$cond : takes an array of three elements, first a boolean expression, second and third are values to use for the field value - if boolean expression is true then it uses second element for value, if not then third element.

you can nest these so that third element is itself a $cond expression to get if-then-else-if-then-etc.

string manipulation is a little awkward but you do have $substr available.

If you post some examples of what exactly you tried, I may be able to spot why it didn't work.

Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133
  • Thanks for your reply. Your suggestion was the very first thing I attempted and quickly hit a dead-end when I realized I couldn't check for the existence of a string pattern using the supported string operators. I need something like indexOf() in order to look for certain patterns in the url. – Edenbauer Nov 06 '12 at 02:10
  • where can the substring occur in the "url"? Is it something that's feasible to store at the time you initially write the document? – Asya Kamsky Nov 06 '12 at 19:32
  • I am having a similar situation. I have two fields A and B and their existence in the document is mutually exclusive. I have to group by A when A exists and group by B when B exists, but it looks like you can't have $cond in a $project..I tried writing $project in two ways: {$project: {MyKey: {$cond: [{$exists: ["$A", true]}, "$A", "$B"]}}} and {$project: {MyKey: {$cond: [{"A": {$exists:true}}, "$A", "$B"]}}} But I keep getting the error: { "errmsg" : "exception: invalid operator '$exists'", "code" : 15999, "ok" : 0 } ...Perhaps it's just an annoying syntax thing :( – Aafreen Sheikh Jan 03 '13 at 14:41
  • @AafreenSheikh what you describe is doable - you probably want to start another question with your problem rather than trying to explain in comments. – Asya Kamsky Jan 04 '13 at 00:03
  • @AsyaKamsky Asked here: http://stackoverflow.com/questions/14213636/conditional-grouping-with-exists-inside-cond – Aafreen Sheikh Jan 08 '13 at 11:10