2

I have a users schema and a votes schema. I'm trying to return only users who haven't voted (have no returned votes).

I found this answer and using $lookup I have the below code to find each user and return all their votes as well. Which is halfway to what I'm trying to achieve.

How would I build a query so it only returns a user if they have no votes?

db.users.aggregate([
    { 
        $addFields: { "_id": { "$toString": "$_id" } }
    },
   {
     $lookup:
       {
         from: "votes",
         localField: "_id",
         foreignField: "voterId",
         as: "votes"
       }
  }
])

Another question once I have a working solution, how would I go about scaling this up? Running this query in Robo 3T takes 9.05 seconds already for just loading 50 users and I have almost 40,000 users and over 200,000 votes in my database (which will only grow). Is there a more efficient way to do this? The final code will run on a Node.js server.

Update

As silencedogood said in a deleted answer, I don't need to use $addFields because user._id is automatically converted to a string (I thought it would be an ObjectId() initially). This however only saves 1 second off of loading 50 users (8.14s).

db.users.aggregate([
   {
     $lookup:
       {
         from: "votes",
         localField: "_id",
         foreignField: "voterId",
         as: "votes"
       }
  }
])

I still need to figure out how to only return users who haven't voted.

Community
  • 1
  • 1
Luke Brown
  • 1,854
  • 2
  • 28
  • 49
  • Please share request data and response data at jsoneditor online – Mahesh Bhatnagar Dec 18 '19 at 18:31
  • @MaheshBhatnagar the only fields that are important are `user._id` and `votes.voterId` which are both strings and should match if a vote is found. – Luke Brown Dec 18 '19 at 18:44
  • I find it interesting that this only saved you 1 second... I'll have to think on that. Anyways I'll update my answer in the meantime. 8.14s is simply unacceptable! – silencedogood Dec 18 '19 at 18:58

1 Answers1

3

An example shot of your data, and expected result, would help. The $addFields function is likely what is killing your performance. Why do you need this?

If the voterId is formatted as a string in the voter collection, but an objectId in the user collection (which I'm guessing is the case), you'll need to permanently cast to objectId if you want maximum performance. Nonetheless, this is roughly what you're looking for:

db.users.aggregate([
{
 $lookup:
   {
     from: "votes",
     localField: "_id",
     foreignField: "voterId",
     as: "votes"
   }
 },
 { "$match": { "votes.0": { "$exists": false } } }
])

This alone will only return users who don't have a vote entry. The equivalent of a left join, essentially.

Update Since they are both strings, you can disregard that aspect of the answer. As to your performance issue... Not sure at the moment. That seems very unrealistic, I've never experienced query times that lengthy with a simple $lookup.

silencedogood
  • 3,209
  • 1
  • 11
  • 36
  • 1
    I confirm that this answer [works](https://mongoplayground.net/p/Lay1o-TE0wJ) – SuleymanSah Dec 18 '19 at 18:56
  • Thanks @silencedogood, this solution worked. I'm going to try and perhaps pull both schemas into two JSON files locally to compare in JS to see if I can speed it up. Or I might try running the DB locally, this is probably why it's taking so long. – Luke Brown Dec 18 '19 at 19:03
  • @LukeBrown No prob. I think running locally is a good idea, just to help isolate the performance issue. In the meantime, I'm looking at the possibility that there is an inherent performance issue with the query. If so, I can possibly think of other ways to accomplish this. Resulting in better performance, however, I'm not so sure, I'll have to see. Please mark as solution if you get a sec. – silencedogood Dec 18 '19 at 19:12
  • @silencedogood exporting both users and votes and running the query locally brings the time down from 9 seconds to 2 seconds for 50 users. Which to run all users is down from 118 minutes to 26 minutes now. Will keep trying. In hindsight, I should have built a permission once the user has voted called something like `hasVoted`, then I could just query all users who don't have this value - would of been much quicker but I don't have that option anymore. – Luke Brown Dec 18 '19 at 19:22
  • 1
    @LukeBrown my thoughts exactly :) – silencedogood Dec 18 '19 at 19:23
  • @LukeBrown may be an index on votes.voterId field will help? Can you try creating an index if already not exists? – SuleymanSah Dec 18 '19 at 19:25
  • Yes, you can create indexes. Check out [this post](https://stackoverflow.com/questions/43742635/poor-lookup-aggregation-performance) This makes a lot of sense. However, you're saying that when you reduce the collection to 50 documents, you're still getting a slow query speed correct? This is what is vexing to be honest... Did you reduce the size of both collections to only 50? – silencedogood Dec 18 '19 at 19:32