0

Let us consider I have two collections collection-A and collection-B they contain data like the following example

collection-A

{adId:1,adName:'x'}
{adId:2,adName:'y'}
{adId:3,adName:'z'}
.
.

collection-B [ this collection has some millions of rows for each adId ]

{adId:1, ip:'1.2.3.4',viewCount:2}
{adId:1, ip:'1.2.3.5',viewCount:3}
{adId:2, ip:'1.2.3.6',viewCount:1}
{adId:3, ip:'1.2.3.4',viewCount:1}
.
.

Now suppose I want to query collection-A such that I want the name of all rows in collection-A who do not have the corresponding adId row in collection-B with specified Ip: 1.2.3.4 and count more than 2.

I have written following MongoDB aggregate query for above condition which seems to work

   [

            {
                $match: {}
            },

            {
                $lookup: {
                    from:'collection-B',
                    localField:"adId",
                    foreignField: "adId",
                    as:'transaction'
                }
            },

            {
                $match:{
                    transaction:{
                        $not:{
                            $elemMatch:{
                                "ip":"1.2.3.4",
                                count:{$gte:2}
                            }
                        }
                    }
                }
            },
]

Above query works perfectly if the number of matching rows in collection-B are some thousands but if the number of matching rows in collection-B are in some lakhs/millions then I get the error saying MongoError: Total size of documents exceeds maximum document size which I understand why because all the matched rows are combined into one array and the array is appended to the parent row of collection-A which consequently exceeds max documents allowed size.

I found another solution to the problem which is to use $unwind operator immediately after lookup which does exactly opposite that is, it adds parent collection-A row to every corresponding row of collection-B Using unwind solves the problem but in that case number of rows to be searched increases to such a large extent that searching takes some 30 seconds. Now I was thinking about any third approach like put search criteria in lookup operator itself but I don't know can I use search query inside lookup operator or not, I'm not even sure if my approach to the solution is correct.

Any comment or hint to the possible solution will be appreciated

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Point Networks
  • 1,071
  • 1
  • 13
  • 35
  • If you found the solution then you should not need to ask the question, and might then find the solution useful. The existing answer has not changed. – Neil Lunn Jun 05 '18 at 10:56
  • my question is completely different than the one which you mentioned. I have of course same error but I understand the reason behind error and here I'm seeking different approach to the problem not the solution to the problem – Point Networks Jun 05 '18 at 10:58
  • I don't see any difference. What would a different approach be to a join other than applying `$lookup`? There's one pure reason for the BSON Limit being exceeded, and there are "two" approaches to avoiding it. Both of those approaches are documented on an existing answer. Anything other than `$lookup` is simply not a join, and clearly you cannot embed data when it exceeds 16MB. – Neil Lunn Jun 05 '18 at 11:00
  • I was thinking if there is any way to put some query or condition inside lookup operator so that resulting rows from the foreign table will be lesser in number – Point Networks Jun 05 '18 at 11:02
  • I'm thinking you need to read the duplicate and realize "putting the query in the lookup" is **exactly** what it's telling you to do and "how to do it". Great big long detailed explanation from some person. Not sure why they wasted their time, but they did. – Neil Lunn Jun 05 '18 at 11:06

0 Answers0