3

I am fetching all the rows from the collection and experience delay on 100th row. I understand that find method returns cursor and not all the data up front and at certain point need to fetch more data. But the 100th row is the only delay.

Checking images 99
Checking image 100
*pause*
Checking image 101

And then with no visible delay up to 100 000 image.

Used ruby script:

require 'mongo'

time_start = Time.now

mongo = Mongo::MongoClient.new("localhost", 27017)

db = mongo["pics"]

images = db["images"]
albums = db["albums"]

orphans = []

images.find().each do |row|
    puts "Checking image #{row['_id']}"
end

# puts orphans
time_end = Time.now
puts "Total time taken: #{time_end - time_start}"

Used images collection (json)

mongoimport --db pics --collection images file_name

The questions are:

  • does some data come along with the initial cursor?
  • why is the only delay at 100th row? Maybe I've missed something but I don't even see IO reads at that point

Thank you

Ilya Tsuryev
  • 2,766
  • 6
  • 25
  • 29

1 Answers1

4

The default "batch size" of the MongoDB cursor is 100 objects. Means MongoDB fetches 100 objects before fetching the next batch...that is why you see delays. All drivers should provide a method "batch_size()" or similar on the cursor object for setting and retrieving the batch size.

  • You are right. Setting batch_size to 500 makes it smooth. `images.find().batch_size(500).each`. Also using `mongostat` I've checked that by default at 100th row it sends me 3mb of data which is the cause of that hiccup. – Ilya Tsuryev Dec 17 '12 at 11:24