0

I have two collections: document and citation. Their structures are shown below:

# document
{id:001, title:'foo'}
{id:002, title:'bar'}
{id:003, title:'abc'}

# citation
{from_id:001, to_id:002}
{from_id:001, to_id:003}

I want to query the information of cited documents (called references, which is denoted by to_id) of each document. In SQL, I would use the document table left joins citation, and then left joins document to get full information of the references (not just their ids).

However, I can only achieve the first step with $lookup in MongoDB. Here is my aggregate pipeline:

[
    {'$lookup':{
        'from': 'citation',
        'localField': 'id',
        'foreignField': 'from_id',
        'as': 'references'
    }}
]

I am able to get the following results with this pipeline:

{
    id:001,
    title:'foo',
    references:[{from_id:001, to_id:002}, {from_id:001, to_id:003}]
}

The desired result is:

{
    id:001,
    title:'foo',
    references:[{id:002, title:'bar'}, {id:003, title:'abc'}]
}

I have found this answer but it seems to be a one-to-one relationship that is not applicable in my case.

EDIT: Some people said that join should be avoided in MongoDB as it's not a relational database. I choose MongoDB because it's much faster than MySQL in my case.

Tom Leung
  • 334
  • 5
  • 18

1 Answers1

1

You need to use $unwind and again $lookup on same collection, then you should $group by _id to get the desired result.

Try the below:

[
  {
    "$lookup": {
      "from": "citation",
      "localField": "_id",
      "foreignField": "from_id",
      "as": "references"
    }
  },
  {
    "$unwind": "$references"
  },
  {
    "$lookup": {
      "from": "doc",
      "localField": "references.to_id",
      "foreignField": "_id",
      "as": "map"
    }
  },
  {
    "$unwind": "$map"
  },
  {
    "$project": {
      "_id": 1,
      "title": 1,
      "map_id": "$map._id",
      "map_title": "$map.title"
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "title": {
        "$first": "$title"
      },
     "references": {
        "$push": {
          "id": "$map_id",
          "title": "$map_title"
        }
      }
    }
  }
]
ngShravil.py
  • 4,742
  • 3
  • 18
  • 30
  • When you do `$group`, only `_id` field is preserved by default. So, this operator is used to preserve the fields. But, it will preserve only the first value from that grouped documents. You can read about it more [here](https://docs.mongodb.com/manual/reference/operator/aggregation/first/). – ngShravil.py May 25 '20 at 10:46
  • Thank you! Can I add constraints during the lookup stage? For example, some `to_id` do not point to the document in my database (I don't have every document that appears in the references), so I only want the remaining. I think it is like an INNER JOIN. – Tom Leung May 25 '20 at 11:07
  • You can have a look at [this](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/index.html) for examples. – ngShravil.py May 25 '20 at 11:43
  • If you can give few example of such documents and it's corresponding output, may be I can help with that. – ngShravil.py May 25 '20 at 11:43
  • For example, document 001 has a citation to document 004, which is not in my dataset. I would like to only query references that can be bonded to the documents in my dataset. – Tom Leung May 25 '20 at 12:30
  • Yes, post such example of documents in OP, its hard to imagine here. – ngShravil.py May 25 '20 at 12:39