7

I'm using MongoDB's linq driver to get my results from the database

mymongocollection.AsQueryable().Where(x => x.Type == 1);

Now I'd like to return 20 random records from the above. I've been searching but I can't find a proper way to do this with LINQ. I prefer not to do it in memory, but on the database. I found you can use MongoDB's $sample aggregation, but I don't know how to translate this into LINQ (if it's even possible).

Backs
  • 24,430
  • 5
  • 58
  • 85
PoeHaH
  • 1,936
  • 3
  • 28
  • 52
  • have you tried the solution posted [here](https://stackoverflow.com/questions/35990720/having-difficulty-using-mongodb-c-sharp-drivers-sample?noredirect=1&lq=1) ? – felix Oct 09 '17 at 06:29
  • @felix I looked at it but that solution is not Linq. Coming from my code statement above, I'm not sure how I can add a $sample to it while also retaining my LINQ Where statement. – PoeHaH Oct 09 '17 at 06:34

4 Answers4

12

I'm guessing you are using incorrect namespace like using System.Linq;, which provides access to IQueryable.

You should instead use using MongoDB.Driver.Linq; which will give access to MongoQueryable implementation which has sample aggregation

You can try below aggregation in 2.4.0 driver version.

using MongoDB.Driver.Linq; 
collection.AsQueryable().Where(x => x.Type == 1).Sample(20); 

It outputs

{aggregate([{ "$match" : { "Type" : 1 } }, { "$sample" : { "size" : NumberLong(20) } }])}

Reference here

https://jira.mongodb.org/browse/CSHARP-1773

https://jira.mongodb.org/browse/CSHARP-1366

s7vr
  • 73,656
  • 11
  • 106
  • 127
1

I went for full LINQ solution. Only problem is mongo didn't let me sort by non property so I had to call toList early to materialize query.

Selecting random results supported by this answer: Select N Random Records with Linq

var result = MongoCollection
            .AsQueryable()
            .Where(x => x.Type == 1)
            .ToList()
            .OrderBy(x => Guid.NewGuid())
            .Take(20)
            .ToList();

Hope it helps!

I think this will be executed in db (correct me if I am wrong):

var result = MongoCollection
            .AsQueryable()
            .Where(x => x.Type == 1)
            .AsEnumerable()
            .OrderByDescending(x => Guid.NewGuid())
            .Take(20)
            .ToList();

Just a little improvement but still executed in memory.

BOR4
  • 610
  • 4
  • 9
  • 3
    The OP stated *I prefer not to do it in memory* – Mike Debela Oct 09 '17 at 09:11
  • Agreed with Mike. I know this is a possibility, but I prefer not to do it in memory. Basically, I'm hoping there's an all-LINQ solution to this problem but I'm starting to think there is none. – PoeHaH Oct 09 '17 at 09:24
  • Yeah I saw you wanted non memory solution but there were no solutions at all so I gave not prefered one for start =). Maybe if you generate a random Hash based on properties in OrderBy part and sort based on that you can run whole thing in db? – BOR4 Oct 09 '17 at 09:55
1

You can try skip a random quantity of results.

var resultsCount = MongoCollection
                  .AsQueryable()
                  .Where(x => x.Type == 1)
                  .Count();

var randomSkip = (new Random()).Next(0, resultsCount - 20);

var result = MongoCollection
            .AsQueryable()
            .Where(x => x.Type == 1)
            .Skip(randomSkip)
            .Take(20)
            .ToList();
Striter Alfa
  • 1,577
  • 1
  • 14
  • 31
1

Expanding on the random concept, could create a random set of indexes

First, find the total resultset size:

var resultsCount = MongoCollection
  .AsQueryable()
  .Where(x => x.Type == 1)
  .Count();

Then create the random indexes:

var sampleSize = 10;
var rnd = new Random();
var indexes = Enumerable.Range(0, resultsCount-1);
var randomSet = indexes
  .OrderBy(r => rnd.NextDouble())
  .Take(sampleSize)
  .ToList();

Then apply to the query by zipping:

var result = MongoCollection
  .AsQueryable()
  .Where(x => x.Type == 1)
  .Zip(indexes, (x, y)=> Tuple.Create(x,y))
  .Where(tuple => randomSet.Any(r => r == tuple.Item2))
  .Take(sampleSize)       // for good measure, finish when all samples taken
  .Select(t => t.Item1)   // clear the indexes
  .ToList();

It should be efficient memory-wise, but not so network-wise.

This is a test for Linqpad with simulated db items:

var resultsCount = 30;
var sampleSize = 10;

// Create a random set of indexes
var rnd = new Random();
var indexes = Enumerable.Range(0, resultsCount-1);
var randomSet = indexes.OrderBy(r => rnd.NextDouble()).Take(sampleSize).ToList();
randomSet.OrderBy(r => r).Dump("Random set of indexes");

// Simulated resultset
// for convenience simulate db items from index set
var db = indexes.Select(x => "Result" + x.ToString());  

// The query
var sampleOfResults = 
  db.Zip(indexes, (x, y)=> Tuple.Create(x,y))
    .Where(tuple => randomSet.Any(r => r == tuple.Item2))
    .Take(sampleSize)       // for good measure, finish when all samples taken
    .Select(t => t.Item1);  // clear the indexes

sampleOfResults.ToList().Dump("Sample");
Richard Matsen
  • 20,671
  • 3
  • 43
  • 77