2

I have a small DB with about 500 records. I'm trying to implement a versioning scheme where I save the form along with its current version to my Record collection. Ideally, I would like to store the form along with its version number in an embedded document to keep things nice and tidy:

class Structure(db.EmbeddedDocument):
    form = db.ReferenceField(Form, required = True)
    version = db.IntField(required = True)

    @property
    def short(self):
        return {
             'form': self.form,
             'version': self.version
        }

class Record(db.Document):
    structure = db.EmbeddedDocumentField(Structure)

    @property
    def short(self):    
        return {
            'structure': self.structure.short
        }

This way when I recall a record I can grab the form and the version that was used at the time. Running some timing tests:

start = time.clock()
records = Record.objects.select_related()
print ('Time: ', time.clock() - start)  
response = [i.short for i in records]
print ('Time: ', time.clock() - start)

I find the query time for all records Record.objects.select_related() to be reasonable at, ~ 1.12 s, however, I'm finding serialization for the purpose of JSON transfer is extremely expensive at ~ 24.1s!

If I make a slight modification by removing use of the EmbeddedDocument:

class Record(db.Document):
    form = db.ReferenceField(Form, required = True)
    version = db.IntField(required = True)

    @property
    def short(self):    
        return {
            'form': self.form,
            'version': self.version
        }

Running the same test I find the query time to be pretty much unchanged at ~ 1.36 s, however, the serialization time improved by 24x to 1.14s. I really do not understand why use of an embedded document would lead to such as massive penalty in serialization time...? Is dereferencing in an embedded object more difficult?

spitz
  • 658
  • 1
  • 8
  • 19
  • spitz, your example is very messy and confusing. Can you write an accurate and complete example, cause I can't get the meaning of your `Structure`, `Record` and `form`s (you have 2 of them) and what you mean by saying "serialization time for form and structure"? Otherwise, it is an interesting question. – Boris Burkov Aug 24 '16 at 13:41
  • Hm, characteristic times of x20 remind me of N+1 selects problem: http://ses4j.github.io/2015/11/23/optimizing-slow-django-rest-framework-performance/, http://stackoverflow.com/questions/97197/what-is-the-n1-selects-issue. – Boris Burkov Aug 26 '16 at 10:55
  • Bob, I hope my question is now more explicit. Please see the following for what I mean by serialization: http://stackoverflow.com/questions/7102754/jsonify-a-sqlalchemy-result-set-in-flask . This issue seems directly related to the use of the EmbeddedDocument field in mongoengine causing serialization problems. – spitz Aug 29 '16 at 15:47
  • Well, my hypothesis is that in the slow case it does 2 requests to mongo per object: first it lazily gets the document with its EmbeddedDocument, but doesn't de-reference ReferenceField, then it does another request to mongo for each referenced object. I suppose, that in faster case it pre-loads all references in bulk and then just selects from them. See BaseQuerySet in https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/queryset/base.py#L45 and how it makes use of DeReference: https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/dereference.py. – Boris Burkov Aug 29 '16 at 17:17
  • Ah, yeah, there's a `max_depth` param in DeReference. It defaults to 1. I suppose that if you set it to 2, performance of your slower case will imporve to be as fast as the faster. – Boris Burkov Aug 29 '16 at 17:25
  • 1
    I tried records = Record.objects.select_related(max_depth = 2) and the times did not change. It may be related to this http://stackoverflow.com/questions/20224141/mongoengine-deferencing-happens-after-using-select-related – spitz Aug 29 '16 at 18:20

0 Answers0