0

In my model db.listing, there are two field update and location that are sortable:

db.listing.find().sort({ update: -1 }).limit(50);

db.listing.find({
  location: {
    $near: {
      $maxDistance: 10000,
      $geometry: {
        type: 'Point',
        coordinates
      }
    }
  }
}).limit(50);

Notice user could go to the second page, so they could return an end_cursor which is the objectId of the document.

So ideally I like to do something like:

db.listing.sort({ update: -1 }).cursor('_id').lt(end_cursor).limit(50);

db.listing.find({
  location: {
    $near: {
      $maxDistance: 10000,
      $geometry: {
        type: 'Point',
        coordinates
      }
    }
  }
}).cursor('_id').lt(end_cursor).limit(50);

Is there a feature similar to where, but instead it looks for _id as a cursor rather than using it as a second sort?

Of course, I could use skip but it would also require me to find the position of the end_cursor in the query...

It looks like there isn't such feature builtin for Mongoose. So here is my solution:

const getSeenIdsByEndCursor = async ({
  seen_ids = [],
  end_cursor,
  cursor
}) => {
  if (!cursor.hasNext())
    return seen_ids;
  const doc = cursor.next();
  seen_ids.push(doc._id);
  if (end_cursor === doc._id)
    return seen_ids;
  return await getSeenIdsByEndCursor({
    seen_ids,
    end_cursor,
    cursor
  })
}

const listByDistance = async function({
  coordinates,
  distance,
  end_cursor
}) {
  if (!Array.isArray(coordinates)) {
    return;
  }
  if (!end_cursor) {
    return await db.listing.find({
      location: {
        $near: {
          $maxDistance: 10000,
          $geometry: {
            type: 'Point',
            coordinates
          }
        }
      }
    }).limit(50);
  }
  const cursor = db.listing.find({
    location: {
      $near: {
        $minDistance: distance,
        $maxDistance: 10000,
        $geometry: {
          type: 'Point',
          coordinates
        }
      }
    }
  }).limit(50);
  const seen_ids = await getSeenIdsByEndCursors({
    cursor,
    end_cursor
  });
  return await db.listing.find({
    location: {
      $near: {
        $minDistance: distance,
        $maxDistance: 10000,
        $geometry: {
          type: 'Point',
          coordinates
        }
      }
    }
  }).where('_id').nin(seen_ids).limit(50);
}
Aero Wang
  • 8,382
  • 14
  • 63
  • 99
  • Basically your logic is incorrect. The "less than" would apply to the "last seen" `update` value. The only tracking of `_id` is for where that `update` has the same value for multiple documents, so `_id` is a **secondary** sort. More detail on the linked answer. For a "near query" you really want to play with the `minDistance` instead of "less than" or "greater than" semantics. – Neil Lunn Mar 02 '19 at 03:32
  • @NeilLunn I understand the logic is wrong. I am asking, if there is a feature similar to `where`, but instead it looks for `_id` as a cursor. Something like...`cursor('_id').lt(end_cursor)`... – Aero Wang Mar 02 '19 at 03:38
  • `where` is not a "feature". It's just a method implemented in the mongoose for using "query builders" rather than constructing a query object. `where("_id").lt(end_cursor)` is the same as `{ _id: { $lt: end_cursor } }`. You also seem to misunderstand what `cursor` means in this context. Read the existing answer to what you are really asking, even if you seem to think you are asking something different. – Neil Lunn Mar 02 '19 at 03:42
  • @NeilLunn it's just that your method in that answer seems to take up a lot of memory let's say if I have 2 million docs or so. Essentially it reads from the beginning every time... – Aero Wang Mar 02 '19 at 05:22
  • No it does not. I suggest you actually read and try. You don't store anything other than if the last set of values that were all the same. i.e a page of everything for "score 3" would be 20 items, then append until the score changed to "score 2" which is different. So you basically store for `$nin` based on "distinct _id values for items with the same value you are sorting on". The same thing is well documented elsewhere, that is just the most succinct answer on this site. But it's always the same approach. – Neil Lunn Mar 02 '19 at 05:27
  • And rather than providing an "end_cursor" it needs to provide "seen_ids" and exclude the result. That's either server intensive (use an `end_cursor` to get all the `seen_ids` or network intensive - letting the client to send all the `seen_ids` to the server side... – Aero Wang Mar 02 '19 at 05:28
  • Let's say I read 100 pages of listing where there are 100 listings on each page, now I am asking the client to send 10,000 seen_ids from the client side or I do 100 queries on the server side to get all the seen_ids before I can get the correct result of the page...besides if there were new docs added to the DB now they could appear in the result because they are not "seen" yet. – Aero Wang Mar 02 '19 at 05:33
  • I see you can store a minimal distance too on the client side for example but then I need to calculate the distance and store it. Which means the client side application now needs to store different types of data for each of the listing type... – Aero Wang Mar 02 '19 at 05:36

0 Answers0