I know random record selection is not actually supported by MongoDB yet, but I have found a few ways to work around it.
However, I want to select a weighted random item. This is fairly easy with mySql, but I'm not sure of the best way to go about it with Mongo.
The problem I am solving is: I have a collection that holds sweepstakes entries, and based on the number of times a user shares/promotes the contest, they get an "extra entry", to increase their chance of winning. Rather than duplicate the user's entry, I have a field that records the number of times they have shared the contest. I want to use this number as a multiplier to weight the random selection of a "winner".
Here are a few approaches I have thought of:
- Use a variation on the Cookbook random selection method, generating an array of random numbers (equal to the multiplier), for greater chances the record will be near the random point queried (but Mongo doesn't support array [multi-key] indexes, yes? so it might be slow)
- Another variation on the Cookbook random method using a geospatial query, using a round polygon with a radius equal to the multiplier instead of a simple random number (if this is even possible, I've never used MongoDB geo indexes and queries)
- Expand the entries in a new temporary collection, then use one of the MongoDB random selection methods
- Avoid the problem and just store the duplicated entries in Mongo in the first place, and do a regular random select thingamajig
- Keep a separate index of the MongoIDs and their weight multipliers in mySql (either constantly synced, or generated on demand) and use mySql to do a random weighted selection
- Query out a huge array to do it in PHP and hope it doesn't run out of memory! :/
Am I on to anything here? Any other suggestions, for an obvious solution I am missing? I'm going to do some experimenting to see what works, but any feedback on my initial ideas is welcome!!
Performance needs to be "good" not great, since none of these contests are probably ever going to have millions of entries (usually more like [tens of] thousands), so fairness/accuracy is more important than speed. Thanks.