0

I have a collection with documents like this:

{
   "User": { _id: 1, UserName: "username", DisplayName: "DisplayName" },
   "Interests": [1, 4, 7, 25, 30, 34, 46],
   "MinAge": 11,
   "Title": "ad title",
   ...
}

I want to select the 10 documents which matches the most number of interests from a given array, like:

array = [1,7, 30, 33, 38, 46, 55];

How could I do that?

Felipe Santana
  • 703
  • 1
  • 8
  • 15

1 Answers1

0

It would not use an index and thus be a full table scan but the only way I can think to do it is with either a stored procedure or a UDF.

In either case, you'd first need a count of the number that intersect. I suggest you do this by first calculating the intersection array and then taking the length of that. Here is an SO answer with javascript code for calculating array intersection.

If you put that in a UDF named intersectionCount(), you can then use it to return all the rows with the count:

SELECT c.id, intersectionCount(c.Interests, [1, 7, 30, 33, 38, 46, 55]) AS intersectionCount FROM c ...

Then on the client side, you can sort by intersectionCount and take the top 10. Alternatively, you could get the top 10 by using a TOP clause in your SQL but that would not allow you to deal with ties. For instance, what if there are 15 records that have the same number of intersections? One strategy would be to return all 15 eventhough it's more than your desired 10. Another would be to truncate the list. And another strategy would be to use some other criteria to pick the top 10.

To do it in a stored procedure, you'd have to include that function in your stored procedure. You'd then read all the records in batches in your stored procedure, calculate the count by calling the function, and then keeping the top 10. Again, you might want to deal with ties.

Community
  • 1
  • 1
Larry Maccherone
  • 9,393
  • 3
  • 27
  • 43