12

Say I have a picture gallery and a picture could potentially have 100k+ fans. Which ndb design is more efficient?

class picture(ndb.model):
    fanIds = ndb.StringProperty(repeated=True)
    ... [other picture properties]

or

class picture(ndb.model):
    ... [other picture properties]

class fan(ndb.model):
    pictureId = StringProperty()
    fanId = StringProperty()

Is there any limit on the number of items you can add to an ndb repeated property and is there any performance hit with storing a large amount of items in a repeated property? If it is less efficient to use repeated properties, what is their intended use?

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
waigani
  • 3,570
  • 5
  • 46
  • 71
  • 1
    Nothing to do with the answer but I would suggest you to follow the conventions.. class names `CamelCase` and property names `lower_case_underscore`.. – Lipis Mar 13 '13 at 08:53
  • Also for the `pictureId` use the `ndb.KeyProperty(kind=picture)` as you have it in the current model.. and `fanId = ndb.KeyProperty(kind=fan, repeated=True)` instead of `StringProperty` for better handling of the entities. – Lipis Mar 13 '13 at 08:56

2 Answers2

33

Do not use repeated properties if you have more than 100-1000 values. (1000 is probably already pushing it.) They weren't designed for such use.

Guido van Rossum
  • 16,690
  • 3
  • 46
  • 49
  • Jumping into this answer from another question: (stackoverflow.com/questions/26740505). One should not use the repeated properties for more than 10 elements? So relationship should be avoid via repeated keys. Correct? – EsseTi Nov 07 '14 at 08:23
  • @Guido What should we use for such type of bulk data storage ? – Napolean Jan 31 '15 at 08:42
  • @Napolean I think the [NDB PickleProperty](https://cloud.google.com/appengine/docs/python/ndb/properties#types) is what you're looking for. – cjlallana Apr 21 '15 at 13:49
5

Generally v1 would be much cheaper.

In terms of read/write costs, you pay per entity fetch/written, so you want to reduce the number of entities. version 1 will be cheaper. Significantly cheaper if you fetch every fan every time you fetch a picture.

However each entity is limited to 1MB. If you have 100k+ fans, you could hit that limit depending on the size of your fanId. That's not counting your other picture data, so you could blow that 1MB limit. You'll have to add some more complex code to handle overflow cases.

Large entities take longer to fetch than small entities. If you're going to fetch all the fans at once all the time, v1 will be better. If you're only going to fetch say 5 fans at any one point, v2 might be faster (only might). If on the other hand you try to pull 100k fan entities... that's gonna take forever.

dragonx
  • 14,963
  • 27
  • 44