Find documents between two dates, but include documents before first and after last result

Question

We have a collection with multiple documents ordered with respect to a given timestamp. We want to aggregate documents between two timestamps (let's say startTime and stopTime): that is a simple match stage in our aggregation that has a query such as timestamp: {$gte: startTime, $lte: stopTime}. However, we'd like to include two extra documents in the result of this step: the closest document right before startTime, no matter how far back in time we would need to look, and also the closest document right after stopTime. Is there a way to achieve this with the aggregation framework in MongoDB?

Does this answer your question? [How to perform lead and lag in MongoDB](https://stackoverflow.com/questions/70338503/how-to-perform-lead-and-lag-in-mongodb) — ray, Oct 15 '22 at 00:26
@ray Not really, because I have a match stage first that only returns documents between two specific dates; I need to get documents outside of this range later in the pipeline — bsguedes, Oct 16 '22 at 19:21

ray · Answer 1 · 2022-10-25T00:29:27.943

2

Chain up $unionWith with $sort and $limit: 1 to get the documents out of range.

db.collection.aggregate([
  {
    $match: {
      datetime: {
        $gte: ISODate("2022-10-18"),
        $lte: ISODate("2022-10-19")
      }
    }
  },
  {
    "$unionWith": {
      "coll": "collection",
      "pipeline": [
        {
          $match: {
            datetime: {
              $lt: ISODate("2022-10-18")
            }
          }
        },
        {
          $sort: {
            datetime: -1
          }
        },
        {
          $limit: 1
        }
      ]
    }
  },
  {
    "$unionWith": {
      "coll": "collection",
      "pipeline": [
        {
          $match: {
            datetime: {
              $gt: ISODate("2022-10-19")
            }
          }
        },
        {
          $sort: {
            datetime: 1
          }
        },
        {
          $limit: 1
        }
      ]
    }
  }
])

Here is the Mongo Playground for your reference.

edited Oct 25 '22 at 00:29

answered Oct 16 '22 at 22:07

ray

11,310
7
18
42

Thank you! Didn't know how the $facet stage worked, this is a pretty good example! – bsguedes Oct 17 '22 at 16:54
The disadvantage here is that you group up all your relevant documents into one big document and a document has a size limit... – nimrod serok Oct 17 '22 at 19:10
Yes; there is also another disadvantage: $facet does not make use of indexes. – bsguedes Oct 24 '22 at 23:43
@bsguedes updated the solution to use `$unionWith`. It should be by far the most efficient and clean solution as it can leverage the index and avoid expensive lookups. Don't even need to wrangle data at later stages of aggregation pipeline. The code is shorter too :) – ray Oct 25 '22 at 00:30

nimrod serok · Accepted Answer · 2022-10-16T20:56:57.797

One option if you are already after filtering out these documents, is using a $lookup step with a pipeline. It looks a bit clumsy after the $lookups, but I could not think about another way to continue without grouping all the documents, which is not the best way to go.

$match - This is a "fake" step in order to level up with your situation. You already have it in your current pipeline, thus don't need it here
$set the "$$ROOT" in order to use it latter
$lookup twice in order to get your requested documents from the original collection
For each document create an array of documents, in order to get the before and after out of the current documents
$unwind to separate into documents
$group by _id in order to remove the duplicates of the before and after documents
Format

db.collection.aggregate([
  {$match: {timestamp: {$gte: startTime, $lte: stopTime}}},
  {$set: {data: "$$ROOT"}},
  {$lookup: {
      from: "collection",
      let: {},
      pipeline: [
        {$match: {timestamp: {$lt: startTime}}},
        {$sort: {timestamp: -1}},
        {$limit: 1}
      ],
      as: "before"
  }},
  {$lookup: {
      from: "collection",
      let: {},
      pipeline: [
        {$match: {timestamp: {$gt: stopTime}}},
        {$sort: {timestamp: 1}},
        {$limit: 1}
      ],
      as: "after"
  }},
  {$project: {_id: 0, data: {$concatArrays: ["$after", "$before", ["$data"]]}}},
  {$unwind: "$data"},
  {$group: {_id: "$data._id", data: {$first: "$data"}}},
  {$replaceRoot: {newRoot: "$data"}},
  {$sort: {timestamp: 1}}
])

See how it works on the playground example

Find documents between two dates, but include documents before first and after last result

2 Answers2