12

How can I get first and last document based on time field. I can use $group and get $first and $last document, but I don't need grouping here, just get first and last full document. Maybe I could use slice? This query doesn't work:

{
  "aggregate": "353469045637980_data",
  "pipeline": [
    {
      "$match": {
        "$and": [
          {
            "time": {
              "$gte": 1461369600
            }
          },
          {
            "time": {
              "$lt": 1461456000
            }
          }
        ]
      }
    },
    {
      "$project": {
        "first": {
          "$slice": 1
        },
        "last": {
          "$slice": -1
        }
      }
    }
  ]
}
DAXaholic
  • 33,312
  • 6
  • 76
  • 74
Michał Jurczuk
  • 3,728
  • 4
  • 32
  • 56
  • 1
    First and last value of what? Can you be more specific, perhaps include a bunch of test documents and your expected output? – chridam Aug 16 '16 at 12:59
  • I'm expecting list with two entries [first, last]. Where first and last are full documents from collection – Michał Jurczuk Aug 16 '16 at 13:02

3 Answers3

26

Well you need $group but you can simply use a constant (e.g. null, see the docs) for its id so that it results in a single group. $$ROOT then refers to the document itself which you can use with $first and $last like so

$group: {
  _id: null,
  first: { $first: "$$ROOT" },
  last: { $last: "$$ROOT" }
}

Of course you can introduce further $project stages to shape that data into an array (as you mentioned you want a list) etc.

As a side note you may want to introduce a $sort stage to make sure $first and $last have a proper meaning.

DAXaholic
  • 33,312
  • 6
  • 76
  • 74
  • Did you write the query ? I still don't understand. Can you please share the final query to get the first and last document based on creation date – Uzumaki Naruto Sep 25 '18 at 12:25
  • Is there any way to get last two documents on the group by stage? – Nandy Feb 10 '20 at 14:33
1

You can make use of a $facet stage, combined with $sort/$limit stages:

// { time: ISODate("2021-12-04"), b: "hello" }
// { time: ISODate("2021-12-07"), b: "world" }
// { time: ISODate("2021-12-05"), b: "oups"  }
db.collection.aggregate([
  { $facet: {
    first: [{ $sort: { time: 1 } }, { $limit: 1 }],
    last:  [{ $sort: { time: -1 } }, { $limit: 1 }]
  }},
  { $set: { first: { $first: "$first" }, last: { $last: "$last" } } }
])
// { first: { time: ISODate("2021-12-04"), b: "hello" }, last: { time: ISODate("2021-12-07"), b: "world" } }

The $facet stage allows us to run multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its result is stored as an array of documents.

Each field is thus produced by its own aggregation pipeline whose first stage $sorts documents by the time field (in opposite orders), followed by a $limit stage that'll only keep the first item (the smallest as defined by the chosen ordering).

The second part of the pipeline (the $set stage) is just there to clean-up the $facet output format.


Note that a $sort followed by a $limit stage is optimised by Mongo (see here). This allows the sort operation to only maintain the top n results (in our case only 1 element) as it progresses.

Also note that our $sort stages will benefit from having an index on time.

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
  • Does $sort here really use index? I have a collection with 11 million doc, time is indexed. It takes 0.001 sec if I do `db.collection.find({}).sort({time: 1}).limit(1)`, but it takes >73 seconds if I do aggregate as you suggested. Any explanation? – newman Jan 01 '22 at 22:29
  • @newman Damn, you're right: https://docs.mongodb.com/manual/reference/operator/aggregation/facet/#index-use . That's quite sad, there's no real reason not to. – Xavier Guihot Jan 02 '22 at 12:34
  • Thanks for this answer, $facet really makes sense for this, and it's very semantic at the same time. – GMaiolo Aug 05 '22 at 13:09
0

I know this does not answer the question directly but would have helped me a lot when I came across this question.

You could (like DAXaholic explained) use the following query to get the corresponding documents:

[
  $sort: {...},
  $group: {
    _id: null,
    first: { $first: "$$ROOT" },
    last: { $last: "$$ROOT" }
  }
]

The problem I faced was that I got the following error: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.

To work around this issue (without using disk for sorting) you could use:

[
  $group: {
    _id: null,
    first: { $min: "$time" },
    last: { $max: "$time" }
  }
]

The advantage is that you don't need to sort and therefore might not run in the 100 MB memory limit. The downside is that it does not return the entire document (where $min / $max matches) but only the value of the property (in this example time).

Mathias
  • 1,819
  • 4
  • 22
  • 34