1

I have a collection that looks something like this:

[
  {
    "id": 1,
    "tier": 0
  },
  {
    "id": 2,
    "tier": 1
  },
  {
    "id": 3
    "tier": 2
  },
  {
    "id": 4,
    "tier": 0
  }
]

Is there a standard way to select n elements where the probabilty of choosing an element of the lowest tier is p, the next lowest tier is (1-p)*p, and so on, with standard random selection of element?

So for example, if the most likely thing happens and I run the query against the above example with n = 2 and any p > .5 (which I think will always be true), then I'd get back [{"id": 1, ...}, {"id": 4}]; with n = 3, then [{"id": 4}, {"id": 1}, {"id": 2}], etc.

E.g. here's some pseudo-Python code given a dictionary like that as objs:

def f(objs, p, n):
  # get eligible tiers
  tiers_set = set()
  for o in objs:
    eligible_tiers.add(o["tier"])
  tiers_list = sorted(list(tiers_set))
  # get the tier for each index of results
  tiers = []
  while len(tiers) < min(n, len(obis)):
    tiers.append(select_random_with_initial_p(eligible_tiers, p))
  # get res
  res = []
  for tier in tiers:
    res.append(select_standard_random_in_tier(objs, tier)
  return res
Aaron Yodaiken
  • 19,163
  • 32
  • 103
  • 184
  • I don't follow your need. Could you edit the question and include a pseudo-coded query for your requirements? – WiredPrairie Mar 29 '13 at 21:54
  • 1
    Yes, it makes it a bit more clear. But, wow, :), I don't see any way of taking that and making it work in any direct way. You might have some luck using the aggregation framework for a few aspects, but the random selection doesn't map at all. – WiredPrairie Mar 30 '13 at 01:53

1 Answers1

0

First, enable geospatial indexing on a collection:

db.docs.ensureIndex( { random_point: '2d' } )

To create a bunch of documents with random points on the X-axis:

for ( i = 0; i < 10; ++i ) {
    db.docs.insert( { key: i, random_point: [Math.random(), 0] } );
}

Then you can get a random document from the collection like this:

db.docs.findOne( { random_point : { $near : [Math.random(), 0] } } )

Or you can retrieve several document nearest to a random point:

db.docs.find( { random_point : { $near : [Math.random(), 0] } } ).limit( 4 )

This requires only one query and no null checks, plus the code is clean, simple and flexible. You could even use the Y-axis of the geopoint to add a second randomness dimension to your query.

To make your custom random selection, you can change that part [Math.random(), 0], so it best suits your random distribution

Source: Random record from MongoDB

Community
  • 1
  • 1
securecurve
  • 5,589
  • 5
  • 45
  • 80
  • 1
    you can take a read here how they tackled the problem in appboy https://www.mongodb.com/blog/post/remaining-agile-with-billions-of-documents-appboy-s-creative-mongodb-schemas?utm_content=bufferfdcbd&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer – nickmilon Sep 12 '15 at 14:54