8

Is it possible to write union queries in Mongo DB using 2 or more collections similar to SQL queries?

I'm using spring mongo template and in my use case, I need to fetch the data from 3-4 collections based on some conditions. Can we achieve this in a single operation?

For example, I have a field named "circuitId" which is present in all 4 collections. And I need to fetch all records from all 4 collections for which that field matches with a given value.

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
Punit
  • 324
  • 1
  • 4
  • 17
  • Welcome to the world of using a document database. You probably have 4 collections because each one contains documents with a different structure. Where this is the case, the solution is you put all the documents in "one" collection instead, because that's what a document based database without a strict schema is used for. If you are just trying to do everything the same way as you did with an RDBMS then there was little point in switching engines at all. Do what the engine does, instead of what you are used to doing. – Neil Lunn Sep 07 '17 at 07:51
  • @NeilLunn :- In these 1 collection is parent collection and other 3 are reference collections. But , there is no reference given in parent collection for them. That's how DB mapping is designed and it is unlikely to be changed. All these 4 collections having the same unique identifier - vTMId. – Punit Sep 07 '17 at 07:56
  • Comments are not the place for explanations. You cannot do a "union query", as that is an SQL thing and it does not apply here. Your general solutions are to model differently. If you have a data you need to model in MongoDB to solve a problem, then ask your question about that. But Simply asking *"How do I do this SQL Thing in MongoDB?"* with no provided example of what you really need to do is just off topic and unanswerable. Those are the expectations when you post questions here. – Neil Lunn Sep 07 '17 at 08:00

3 Answers3

22

Doing unions in MongoDB in a 'SQL UNION' fashion is possible using aggregations along with lookups, in a single query.

Something like this:

    db.getCollection("AnyCollectionThatContainsAtLeastOneDocument").aggregate(
    [
      { $limit: 1 }, // Reduce the result set to a single document.
      { $project: { _id: 1 } }, // Strip all fields except the Id.
      { $project: { _id: 0 } }, // Strip the id. The document is now empty.

      // Lookup all collections to union together.
      { $lookup: { from: 'collectionToUnion1', pipeline: [...], as: 'Collection1' } },
      { $lookup: { from: 'collectionToUnion2', pipeline: [...], as: 'Collection2' } },
      { $lookup: { from: 'collectionToUnion3', pipeline: [...], as: 'Collection3' } },

      // Merge the collections together.
      {
        $project:
        {
          Union: { $concatArrays: ["$Collection1", "$Collection2", "$Collection3"] }
        }
      },

      { $unwind: "$Union" }, // Unwind the union collection into a result set.
      { $replaceRoot: { newRoot: "$Union" } } // Replace the root to cleanup the resulting documents.
    ]);

Here is the explanation of how it works:

  1. Instantiate an aggregate out of any collection of your database that has at least one document in it. If you can't guarantee any collection of your database will not be empty, you can workaround this issue by creating in your database some sort of 'dummy' collection containing a single empty document in it that will be there specifically for doing union queries.

  2. Make the first stage of your pipeline to be { $limit: 1 }. This will strip all the documents of the collection except the first one.

  3. Strip all the fields of the remaining document by using $project stages:

    { $project: { _id: 1 } },
    { $project: { _id: 0 } }
    
  4. Your aggregate now contains a single, empty document. It's time to add lookups for each collection you want to union together. You may use the pipeline field to do some specific filtering, or leave localField and foreignField as null to match the whole collection.

    { $lookup: { from: 'collectionToUnion1', pipeline: [...], as: 'Collection1' } },
    { $lookup: { from: 'collectionToUnion2', pipeline: [...], as: 'Collection2' } },
    { $lookup: { from: 'collectionToUnion3', pipeline: [...], as: 'Collection3' } }
    
  5. You now have an aggregate containing a single document that contains 3 arrays like this:

    {
        Collection1: [...],
        Collection2: [...],
        Collection3: [...]
    }
    

    You can then merge them together into a single array using a $project stage along with the $concatArrays aggregation operator:

    {
      "$project" :
      {
        "Union" : { $concatArrays: ["$Collection1", "$Collection2", "$Collection3"] }
      }
    }
    
  6. You now have an aggregate containing a single document, into which is located an array that contains your union of collections. What remains to be done is to add an $unwind and a $replaceRoot stage to split your array into separate documents:

    { $unwind: "$Union" },
    { $replaceRoot: { newRoot: "$Union" } }
    
  7. Voilà. You know have a result set containing the collections you wanted to union together. You can then add more stages to filter it further, sort it, apply skip() and limit(). Pretty much anything you want.

sboisse
  • 4,860
  • 3
  • 37
  • 48
  • How does it do in terms of performance? Do you think it's production-ready? – edrichhans Oct 05 '20 at 03:47
  • 1
    We did not profile it, but we used this approach in a production environment, and performance did not look like an issue. – sboisse Oct 06 '20 at 12:50
  • 1
    In my project is it an issue, because the $lookup exceeded its maximum capacity. So for the people, who use it, consider future issues if you are expecting more documents to be written in your collections. – TodorBalabanski Dec 01 '20 at 13:42
  • Also if one of the collections is empty you need some additional conditions like [here](https://stackoverflow.com/a/42224747/4222504). – Véger Lóránd Sep 29 '22 at 11:56
12

Starting Mongo 4.4, the aggregation framework provides a new $unionWith stage, performing the union of two collections (the combined pipeline results from two collections into a single result set).

Thus, in order to combine documents from 3 collections:

// > db.collection1.find()
//   { "circuitId" : 12, "a" : "1" }
//   { "circuitId" : 17, "a" : "2" }
//   { "circuitId" : 12, "a" : "5" }
// > db.collection2.find()
//   { "circuitId" : 12, "b" : "x" }
//   { "circuitId" : 12, "b" : "y" }
// > db.collection3.find()
//   { "circuitId" : 12, "c" : "i" }
//   { "circuitId" : 32, "c" : "j" }
db.collection1.aggregate([
  { $match: { circuitId: 12 } },
  { $unionWith: { coll: "collection2", pipeline: [{ $match: { circuitId: 12 } }] } },
  { $unionWith: { coll: "collection3", pipeline: [{ $match: { circuitId: 12 } }] } }
])
// { "circuitId" : 12, "a" : "1" }
// { "circuitId" : 12, "a" : "5" }
// { "circuitId" : 12, "b" : "x" }
// { "circuitId" : 12, "b" : "y" }
// { "circuitId" : 12, "c" : "i" }

This:

  • First filters documents from collection1
  • Then includes documents from collection2 into the pipeline with the new $unionWith stage. The pipeline parameter is an optional aggregation pipeline applied on documents from the collection being merged before the merge happens.
  • And also includes documents from collection3 into the pipeline with the same $unionWith stage.
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
  • For those asking why anyone would want to do this in MongoDB; consider Time Series data that is stored in collections by month (e.g. MyLogDataYYYYMM). This allows easy dropping of old log data without dealing with the expensive Delete operation. However you may need to query across multiple months worth of data. This is where $unionWith is super handy. – GoodEnuf Oct 19 '20 at 17:43
1

Unfortunately document based MongoDB doesn't support JOINS/Unions as in Relational DB engines. One of the key design principles on MongoDB is to prevent joins using embedded documents as per your application's data fetch patterns. Having said that, you will need to manage the logic in your application end if you really need to use the 4 collections or you may redesign your DB design as per MongoDB best practices.

For more info : https://docs.mongodb.com/master/core/data-model-design/

Pubudu Jayawardana
  • 2,250
  • 1
  • 13
  • 18
  • I don't have the option to redesign the DB model. We can achieve this in 4 different queries , which is querying the different collection each time and then we can combine the result of all the 4 operations, but that doesn't seem to be a good approach. I want to do it in a single operation/query. – Punit Sep 07 '17 at 08:36
  • May be you can use DBRefs, however u might want to modify your documents. Please have a look : https://docs.mongodb.com/manual/reference/database-references/#dbrefs – Pubudu Jayawardana Sep 07 '17 at 09:49
  • NOTE: Even at the time of the original post, mongodb v3.2 was released which supported `$lookup`, essentially JOIN. As of summer 2020, filtered and non-equijoin `$lookup` are supported plus `$graphLookup` and in v4.4, `$unionWith` – Buzz Moschetti Jul 22 '20 at 14:31