6

Trying to understand with either C#/Linq or even raw Mongodb query itself how to join multiple arrays as a cartesian product.

Say for example I had a collection that I filtered down to the following two documents:

[
{"movie":"starwars","showday":"monday"},
{"movie":"batman","showday":"thursday"},
{"movie":"sleepless","showday":"tuesday"}
]

[
{"actor":"angelina","location":"new york"},
{"actor":"jamie","location":"california"},
{"actor":"mcavoy","location":"arizona"}
]

How can I join each item in each array to produce the following type of result?

[{"movie":"starwars","showday":"monday","actor":"angelina","location":"new york"},
{"movie":"batman","showday":"thursday","actor":"angelina","location":"new york"},
{"movie":"sleepless","showday":"tuesday","actor":"angelina","location":"new york"},
{"movie":"starwars","showday":"monday","actor":"jamie","location":"california"},
{"movie":"batman","showday":"thursday","actor":"jamie","location":"california"},
{"movie":"sleepless","showday":"tuesday","actor":"jamie","location":"california"},
{"movie":"starwars","showday":"monday","actor":"mcavoy","location":"arizona"},
{"movie":"batman","showday":"thursday","actor":"mcavoy","location":"arizona"},
{"movie":"sleepless","showday":"tuesday","actor":"mcavoy","location":"arizona"}]

I am looking for a solution that could work with any number of documents. So for example if in this example there was a 3rd document that also had 3 object arrays that would produce a result set of 27 items in the array - or 27 rows as it were.

Hoping for a solution of how to use C# (Linq?) Mongodb Driver to query and return data like this but would be open to even a mongodb specific query as I can hopefully reverse the logic from there. thank you

Paul W.
  • 299
  • 1
  • 5
  • 15
  • Is your two separate collections or you want to iterate one collection. And can you explain more what you want and what you tried – Neo-coder May 23 '17 at 13:36
  • I have question actually, why your model is `{[movies],[actors]}` and not `Movie{ movie, showday, [Actors] }` pardon me it is off topic – Munzer May 27 '17 at 20:38
  • 1
    Since there is no relation at all in the data, the "database" itself cannot do this. MongoDB is not like an RDBMS where it would be possible to arbitrarily pull in data from a number of sources and run functional output over it. In fact all operations are really on one collection at a time, with the exception of `$lookup` and the like, but those operators would need a related key. You do this in code, or use `.eval()` to run code on the server. But **DONT** use `eval()`. So this is a coding excercise and not a database solution. – Neil Lunn May 28 '17 at 05:36

1 Answers1

3

You can try below aggregation pipeline.

Note mergeObjects aggregation operator is available in the 3.5.6 + development release which will be rolled into upcoming 3.6 release.

db.collection.find();
{
 "data" : [
  [
   {
    "movie" : "starwars",
    "showday" : "monday"
   },
   {
    "movie" : "batman",
    "showday" : "thursday"
   },
   {
    "movie" : "sleepless",
    "showday" : "tuesday"
   }
  ],
  [
   {
    "actor" : "angelina",
    "location" : "new york"
   },
   {
    "actor" : "jamie",
    "location" : "california"
   },
   {
    "actor" : "mcavoy",
    "location" : "arizona"
   }
  ]
 ]
}

Aggregation using conditional expression.

aggregate({
 $project: {
  cp: {
   $reduce: {
    input: "$data",
    initialValue: {
     $arrayElemAt: ["$data", 0] // Set the initial value to the first element of the arrays.
    },
    in: {
     $let: {
      vars: {
       currentr: "$$this", // Current processing element
       currenta: "$$value" // Current accumulated value 
      },
      in: {
       $cond: [{ // Conditional expression to return the accumulated value as initial value for first element
        $eq: ["$$currentr", "$$currenta"]
       },
       "$$currenta",
       { // From second element onwards prepare the cartesian product
        $reduce: {
         input: {
          $map: {
           input: "$$currenta",
           as: a"a",
           in: {
            $map: {
             input: "$$currentr",
             as: r"r",
             in: {
              $mergeObjects: ["$$a", "$$r"] // Merge accumulated value with the current processing element
             }
            }
           }
          }
         },
         initialValue: [],
         in: {
         $concatArrays: ["$$value", "$$this"] // Reduce the merged values which will be used as accumulator for next element
         }
        }
       }]
      }
     }
    }
   }
  }
 }
});

Aggregation( using $setUnion ).

This solution was only added to suppress the conditional expression to give more readable pipeline.

aggregate({
 $project: {
  cp: {
   $reduce: {
    input: "$data",
    initialValue: {
     $arrayElemAt: ["$data", 0] // Set the initial value to the first element of the arrays.
    },
    in: {
     $let: {
      vars: {
       currentr: "$$this", // Current processing element
       currenta: "$$value" // Current accumulated value 
      },
      in:{ 
       $reduce: {
        input: {
         $map: {
          input: "$$currenta",
          as: "a",
          in: {
           $map: {
            input: "$$currentr",
            as: "r",
            in: {
             $mergeObjects: ["$$a", "$$r"] // Merge accumulated value with the current processing element
            }
           }
          }
         }
        },
        initialValue: [],
        in: {
         $setUnion: ["$$value", "$$this"] // Reduce the merged values which will be used as accumulator for next element
        }
       }
      }
     }
    }
   }
  }
 }
});

Update

Both the above solutions will not work with repeating values across arrays as noted by Asya Kamsky's comments below because of the incorrect $cond in the first solution and $setUnion in second solution.

The correct fix is to

start with initialValue of [ { } ]

Or

change input to exclude the first element like input: {$slice:["$data", 1, {$subtract:[{$size:"$data"},1]}]},

Complete aggregation pipeline

aggregate({
 $project: {
  cp: {
   $reduce: {
    input: {$slice:["$data", 1, {$subtract:[{$size:"$data"},1]}]},
    initialValue: {$arrayElemAt:["$data",0]},
    in: {
     $let: {
      vars: {
       currentr: "$$this", 
       currenta: "$$value" 
      },
      in:{ 
       $reduce: {
        input: {
         $map: {
          input: "$$currenta",
          as: "a",
          in: {
           $map: {
            input: "$$currentr",
            as: "r",
            in: {
             $mergeObjects: ["$$a", "$$r"] 
            }
           }
          }
         }
        },
        initialValue: [],
        in: {
         $concatArrays: ["$$value", "$$this"] 
        }
       }
      }
     }
    }
   }
  }
 }
});

Reference: Cartesian product of multiple arrays in JavaScript

s7vr
  • 73,656
  • 11
  • 106
  • 127
  • this does not work if any of the fields have any value repeating across the arrays – Asya Kamsky May 31 '17 at 18:30
  • @AsyaKamsky Yes that is 100 % correct. I was aware of that but just couldn't find anything to replace the conditional expression. I have added the previous revision too which I believe was correct. – s7vr May 31 '17 at 18:34
  • nope, that doesn't work correctly if you have repeated elements. – Asya Kamsky May 31 '17 at 20:43
  • Are you referring to the usage of `$mergeObjects` which removes the duplicates fields across arrays ? or the usage of `$setUnions` which removes the duplicate documents across arrays ? – s7vr May 31 '17 at 20:46
  • no, I'm saying if you test the code with document with data: [ [ { "a" : 1 }, { "a" : 1 } ], [ { "a" : 1 }, { "a" : 1 } ], [ { "a" : 1 } ] ] you'll see that it returns the wrong number of results - I'd expect 4 and it returns 2. – Asya Kamsky May 31 '17 at 21:20
  • or even more obvious: data: [ [ {a:1 } , {b:1 } ], [ {a:1 } , { b: 1} ] ] – Asya Kamsky May 31 '17 at 21:22
  • the fix is pretty simple - you don't need that $cond at all, just change your "input" to exclude the first element since you already have it. – Asya Kamsky May 31 '17 at 21:24
  • alternatively, just start with initialValue of [ { } ] and input "$data" but get rid of the $cond entirely, you don't ever need it. – Asya Kamsky May 31 '17 at 21:30
  • Thank you for your valuable feedback. Updated answer with your suggested fixes. – s7vr Jun 01 '17 at 02:51