0

I have a fairly complex document model that is structurally like this:

{
   _id: 1,
   "title": "I'm number one",
   ... (many other meta data text fields not desired in the summary)
   "foo": {
      "tom":   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
      "dick":  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
      "harry": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
      ... (Total of 14 fields in foo)
   },
   "bar": {
      "joe":   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
      "fred":  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
      "bob":   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
      ...  (Total of 14 fields in bar)
   },
   "dodads": [
      {
         "contraption": 0,
         "doohickey": 0,
         "gewgaw": 0,
         "gizmo": 0,
         ... (total of 15 elements in each doodad object)
      },
      {
         "contraption": 0,
         "doohickey": 0,
         "gewgaw": 0,
         "gizmo": 0,
         ...
      },
      ... (total of 6 objects in dodads object array)
   ]
},
... (a couple hundred documents in total)

What I'm looking for is a summary of all the objects/arrays that have numeric data. I would like the result to be a document, in the original format, that contains the numeric fields summarized. For now, let's say the documents all have the same structure.

The aggregation result would be like the following

{
   "foo": {
      "tom":   [35, 65, 13, 22, 36, 58, 93, 43, 56, 44, 23, 72],
      "dick":  [56, 87, 28, 49, 34, 22, 48, 86, 29, 23, 88, 29],
      ... (All 14 fields in foo)
   },
   "bar": {
      "joe":   [87, 28, 49, 34, 22, 48, 86, 29, 23, 88, 29, 47],
      "fred":  [13, 22, 36, 58, 93, 43, 56, 44, 23, 72, 35, 65],
      ...  (All 14 fields in bar)
   },
   "dodads": [
      {
         "contraption": 45,
         "doohickey": 88,
         "gewgaw": 23,
         "gizmo": 64,
         ... (All 15 elements in each doodad object)
      },
      {
         "contraption": 12,
         "doohickey": 73,
         "gewgaw": 57,
         "gizmo": 86,
         ...
      },
      ... (All 6 objects in dodads object array)
   ]
}

I believe I can unwind the arrays, specify sums and projections and get exactly what I want with an extensive and verbose aggregation pipeline. I could also do multiple queries grabbing the component pieces (one that's just foo, a second that's just bar...).

What I'm wondering is, is there a shorthand way of specifying summarizations? For example, can I say I want the summary of foo or foo.tom and get back their contents summarized?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Jim
  • 83
  • 9
  • 1
    Can you clarify what you mean by "summarize"? Are you asking for only the numeric fields and no other fields in the output – Neil Lunn Feb 10 '14 at 23:04
  • @NeilLunn Yes a summary of all the objects/arrays that have numeric data. I added a sample output to hopefully clarify the result I'm looking for. – Jim Feb 12 '14 at 00:57

1 Answers1

0

There are some things in your document structure that are really not going to help you here. That is primarily the use of sub-documents like these:

"foo": {
   "tom":   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
   "dick":  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
   "harry": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
},
"bar": {
   "joe":   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
   "fred":  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
   "bob":   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
}

That makes things pretty difficult as you can usually only get at the contained fields with notation such as "foo.tom", "bar.fred" etc. For reasons I have commented on before, and which is best explained by following through the links, but summarizing, where this is possible you are going to make life easier by changing the structure of the documents:

"foo": [
   { "name": "tom", "values": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] },
   { "name": "dick", "values": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] },
   { "name": "harry", "values": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] }
],

As this will give you better access to query the elements than the explicit references you would need to use otherwise. The answers I have given before go through this in more depth.

As for what you want to do in finding the fields that are numeric, I asked this question here which is basically a rewording of what you require. From the response there is an approach to doing this using mapReduce.

map = function() {
    function isNumber(n) {
      return !isNaN(parseFloat(n)) && isFinite(n);
    }

    var numerics = [];
    for(var fn in this) {
        if (isNumber(this[fn])) {
            numerics.push({f: fn, v: this[fn]});
        }
        if (Array.isArray(this[fn])) {
            // example ... more complex logic needed
            if(isNumber(this[fn][0])) {
                numerics.push({f: fn, v: this[fn]});
            }
        }
    }
    emit(this._id, { n: numerics });
};

reduce = function(key, values) {
  return values;  
};

This may be what you need, but from this skeleton beware that you may need to do some complex unwinding of the fields in your document in order to test this as there is really no simple way to do it. You would basically have to add a lot of traversal logic into that to come up with what you want in the structure that you have.

As you seem to be after "finding out information on the structure of the documents", then you might want to look at the answers on this question: MongoDB Get names of all keys in collection

Community
  • 1
  • 1
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • This is a document structure I created so I have control over changing it. This is my initial structure (these are just dummy field names for example purposes). It works for my user interface but can be modified to facilitate aggregation. So it's a matter of just summarizing the numeric values distributed in multiple documents. A roll up in other words. – Jim Feb 14 '14 at 18:25