6

I have a collection named items with three documents.

{
  _id: 1,
  item: "Pencil"
}
{
  _id: 1,
  item: "Pen"
}
{
  _id: 1,
  item: "Sharpner"
}

How could I query to get the document as round-robin? Consider I got multiple user requests at the same time.

so one should get Pencil other will get Pen and then other will get Sharpner.

then start again from the first one.

If changing schema is a choice I am also ready for that.

Promila Ghosh
  • 389
  • 4
  • 12
john cena
  • 191
  • 1
  • 1
  • 7
  • Honestly I do not know if you can do that, but the simplest approach is changing your schema storing a counter different for each document (which could be your `_id`), then selecting them using this counter. Obviously you have to keep in memory the current counter. – Marco Luzzara Jul 06 '18 at 16:32

4 Answers4

0

I think I found a way to do this without changing the schema. It is based on skip() and limit(). Moreover you can specify to keep the internal sorting order for returned documents but as the guide says you should not rely on this, especially because you are losing performance since the indexing is overridden:

The $natural parameter returns items according to their natural order within the database. This ordering is an internal implementation feature, and you should not rely on any particular structure within it.

Anyway, this is the query:

db.getCollection('YourCollection').find().skip(counter).limit(1)

Where counter stores the current index for your documents.

Marco Luzzara
  • 5,540
  • 3
  • 16
  • 42
0

Few things to start..

  1. _id has to be unique across a collection especially when the collection is only a replication set.
  2. This is a very stateful requirement and would not work well with a distributed set of services for example.

With that said, assuming you really just want to iterate from the database i would use cursors to accomplish this. This will do a collection scan and is very inefficient for the record.

var myCursor = db.items.find().sort({_id:1});

while (myCursor.hasNext()) {
   printjson(myCursor.next());
}

My suggestion is that you should pull all results from the database at once and do your iteration in the application tier.

var myCursor = db.inventory.find().sort({_id:1});
var documentArray = myCursor.toArray();
documentArray.foreach(doSomething)
Scott
  • 174
  • 1
  • 13
  • My use case (idk if it was the original question, but I think it was) has concurrent access from multiple clients, so I don't think any of your suggestions would work. My workaround is writing a "last accessed" value to the document when used (using find_one_and_update) and sorting on that value picking the least recently used one. But it doesnt really scale well... – Cyberwiz Mar 26 '22 at 15:56
  • Ok if you have the need to fulfill this requirement from a distributed system my question is does it have to be consistent across the system? If so why? The design you mention here will work, but you will find that it wont scale due to latches on the document you are trying to update / select from other services in the system. – Scott Mar 27 '22 at 16:51
  • My particular use case is that my db contains test data for a performance test (a list of customers). Multiple load generators are reading from the db concurrently. Strictly correct ordering is not important though. – Cyberwiz Mar 28 '22 at 10:07
  • If its just Performance Testing and you have no requirement to be globally consistently ordered then i would use the second answer i provided here provided the dataset can be held in memory and loop through in memory.. Otherwise the first works to only grab what you need for the next request. Upside to grabbing all of the data at once will be that you arent waiting for a response from a DB to produce load. – Scott Mar 28 '22 at 13:08
  • Sorry, I was unclear: ordering and (most importantly) not getting the same record twice is important, but *strict* ordering is not (it is ok if a client doesnt get the right one every time, as long as there are no ”duplicates”) – Cyberwiz Mar 28 '22 at 15:03
  • ahh if thats the case i suggest you partition your data dynamically.. such that instance 1 gets one "chunk" and instance 2 gets a different "chunk". Get that data all at once before the test starts then kill the process when you have iterated through all data you require. You could also use a cursor with that approach if you like. – Scott Mar 28 '22 at 16:29
  • Hmm. Thats not a bad idea. But as I have a variable number of readers it might become a little hard - and I dont know how many customers I’ll need before hand. I guess I could explore it.. it just feels like there should be a simpler way.. – Cyberwiz Mar 28 '22 at 17:50
0

If this is about distribution you may consider fetching random documents instead of round-robin via aggregation/$sample:

db.collection.aggregate([
 {
  "$sample": {
    "size": 1
 }
}
])

playground

Or there is options to randomize via $rand ...

R2D2
  • 9,410
  • 2
  • 12
  • 28
0

Use text findOneAndUpdate after restructuring the data objects

db.counter.findOneAndUpdate( {}, pipeline)
    {
        "_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
        "values" : [
            "Pencil",
            "Pen",
            "Sharpener"
        ],
        "selected" : "Pencil",
        "counter" : 1
    }

 db.counter.findOneAndUpdate( {}, pipeline)
    {
        "_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
        "values" : [
            "Pencil",
            "Pen",
            "Sharpener"
        ],
        "selected" : "Pen",
        "counter" : 2
    }

where the data object is now:

    {
        "_id" : ObjectId("6242fe3bc1551d0f3562bcb2"),
        "values" : [
            "Pencil",
            "Pen",
            "Sharpener"
        ],
        "selected" : "Pencil",
        "counter" : 1
    }

and the pipeline is:

[{$project: {
     values: 1,
     selected: {
      $arrayElemAt: [
       '$values',
       '$counter'
      ]
     },
     counter: {
      $mod: [
       {
        $add: [
         '$counter',
         1
        ]
       },
       {
        $size: '$values'
       }
      ]
     }
}}]

This has some merits:

  • Firstly, using findOneAndUpdate means that moving the pointer to the next item in the list and reading the object happen at once.
  • Secondly,by using the {$size: "$values"} adding a value into the list doesn't change the logic.
  • And, instead of a string an object could be used instead.

Problems: This method would be unwieldy with more than 10's of entries

It is hard to prove that this method works as advertised so there is an accompanying Kotlin project. The project uses coroutines so it is calling a find/update asynchronously.

text GitHub

The alternative (assuming 50K items and not 3): Set-up a simple counter {counter: 0} and update as follows:

db.counter.findOneAndUpdate({},
[{$project: {
 counter: {
  $mod: [
   {
    $add: [
     '$counter',
     1
    ]
   },
   50000
  ]
 }
}}])

Then use a simple select query to find the right document.

I've updated the github to include this example.

Ilan Toren
  • 50
  • 6
  • Nice solution! When you say it would become unwieldy, do you mean performance-wise? I have up to 50k entries.. – Cyberwiz Mar 29 '22 at 20:35
  • Oh. I misread your suggestion. Do you really mean I should put all the data in a single document? That would not work for me (for all kinds of reasons) – Cyberwiz Mar 30 '22 at 07:31
  • It is too slow with 50K entries in the values array. So you are left with a solution where the counter is updated with a findOneAndUpdate [{$project: {counter: { $mod: [{ $add: [ '$counter',1]}, 50000]}}}] followed by a second query using that index (you can add it as a field to the test documents ) – Ilan Toren Mar 30 '22 at 07:36
  • Look at the alternative in the [GitHub](https://github.com/ilantoren/stackOverflow) – Ilan Toren Mar 30 '22 at 15:07