When you are unable or unwilling to change the schema, you could do both with MapReduce
Unique values per document
Your map-function would concatenate all the arrays in products into one, remove duplicates and then emit the size of that array with the _id
as key. Details about how to remove duplicates can be found in this question (ignore the answers which use libraries for web-browser javascript).
function mapFunction() {
var ret = [];
for (var product in this.products) {
for (var i = 0; i < product.length; i++) {
ret.push(product[i]);
}
}
[ remove duplicates with your favorite method from question 9229645 ]
return ret.length;
}
Your keys are unique, so your reduce-function will never be called with more than one value per key. That means it can just return the first element of the values-array.
function reduceFunction(key, values) {
return values[0];
}
Unique values overall
You can do this by emitting each value as a key but with a meaningless value.
Your map-function would iterate the products-object, then iterate the array
function mapFunction() {
for (var product in this.products) {
for (var i = 0; i < product.length; i++) {
emit(product[i], null);
}
}
}
Because the values are meaningless, your reduce-function doesn't do anything with them:
function reduceFunction(key, values) {
return null;
}
The result will be a set of documents where each _id
is one of the unique values in your data.
When you can change the schema
When there is no good reason to keep your schema the way it currently is, you could make your life much easier by turning the products
object into an array:
products: [
{ product: "product_a", values: ["v1", "v2"] },
{ product: "product_b", values: ["v3", "v2"] }
]
In that case you could use the aggregation-pipeline.
- use $unwind to turn the values-arrays into unique documents
- use $group with $addToSet to re-merge the documents while discarding documents
- use $unwind again to get a stream of unique documents again, but this time without duplicates
- use $group with $sum:1 to count the unique values.