Modeling sub-collections in MongoDB Realm Sync

Question

I'm new to MongoDB as well as to MongoDB Realm Sync. I was following the Realm Sync tutorial and Realm data model docs, but I wanted to learn more so I tweaked the Atlas collection structure as follows.

Projects > Tasks // i.e. tasks is a sub-collection in each project.

What I don't know is how to come up with Realm Sync Schema which can support Atlas sub-collections. The best I came up with is a Schema where Tasks are modelled as an array within the Project. But, I'm worried that this can hit the 16MB (although a lot!) document limit for projects with a lot of the tasks.

{
  "bsonType": "object",
  "properties": {
    "_id": {
      "bsonType": "objectId"
    },
    "_partition": {
      "bsonType": "string"
    },
    "name": {
      "bsonType": "string"
    },
    "tasks": {
      "bsonType": "array",
      "items": {
          "bsonType": "object",
          "title": "Task",
          "properties": {
              "name": {
                "bsonType": "string"
              },
              "status": {
                "bsonType": "string"
              }
          }
      }
    }
  },
  "required": [
    "_id",
    "_partition",
    "name",
  ],
  "title": "Project"
}

Looking forward on how to model sub-collection the right way.

Edit

Here's my client side Realm models.

import Foundation
import RealmSwift

class Project: Object {
    @objc dynamic var _id: String = ObjectId.generate().stringValue
    @objc dynamic var _partition: String = "" // user.id
    @objc dynamic var name: String = ""
    var tasks = RealmSwift.List<Task>()
    override static func primaryKey() -> String? {
        return "_id"
    }
}

class Task: EmbeddedObject {
    @objc dynamic var name: String = ""
    @objc dynamic var status: String = "Pending"
}

As far the CRUD operations are concerned, I only create a new project and read existing projects as follows.

// Read projects
realm.objects(Project.self).forEach { (project) in
   // Access fields     
}
        
// Create a new project
try! realm.write {
    realm.add(project)
}

The question is a little confusing due to the terminology - not your doing as there is some overlap between Realm [collection](https://docs.mongodb.com/realm/sdk/ios/data-types/collections/) and MongoDB Atlas [collection](https://docs.mongodb.com/manual/core/data-modeling-introduction/#flexible-schema). Realm collections are homogenous sets of data and there are two types of collections, Realm and List. On the other hand Atlas collections can be a non-homogenous "anything".. and Atlas cannot have sub-collections whereas Realm can. I think you asking about a List property within a Realm object — Jay, Mar 14 '21 at 13:26
Also, it's not clear how the shape of your object relates to a limit; Realm does not have array's so what kind of Realm property is that? if it's a List property within a Realm object, the objects in the List are not 'in' the main object - only a reference to them so there would be minimal impact. Did you see something stating the max document size was 4Mb? The [maximum size of a BSON document is 16Mb](https://docs.mongodb.com/master/reference/limits/#BSON-Document-Size) and you can get larger by using GridFS (images, video etc). Perhaps we could help more if we knew your coding platform. — Jay, Mar 14 '21 at 13:39
@Jay Thank you for bringing out some points. Couple of things, first of all the limit is 16 MB (that was an error on my part!). By sub-collections I was referring to Atlas nested documents (sub-collections is a term used in Firebase world!). I think Atlas nested documents are counted against the 16 MB doc limit - hence I'm worried that having task list and modelling the tasks within a project as an Atlas nested document (and as a Realm List on the client side) will exhaust the limit. So how can I model it as a reference rather than a nested document in a proper way on Atlas & Realm. — Siddharth Kamaria, Mar 14 '21 at 17:42
Apologies if I sound confusing, but I'm yet to wrap around the terminologies of Realm around my head :) — Siddharth Kamaria, Mar 14 '21 at 18:06
You should add your coding platform as a tag as well so any answers will be more on point with what you're doing. — Jay, Mar 14 '21 at 23:31
Great, now let's move away from the server side and show us your Swift Realm models and then describe what you're after with those models. I am suggesting that so we get a clear understanding of the relationship between your realm models and what kinds of queries you're running against them. Including some code would be great as well. — Jay, Mar 15 '21 at 17:25

Jay · Accepted Answer · 2021-09-01T17:09:45.010

Your code looks great and your heading the right direction, so this answer is more explanation and suggestions on modeling than hard code.

First, Realm objects are lazily loaded which means they are only loaded when used. Tens of thousands of objects will have very little impact on a devices memory. So suppose you have 10,000 users and you 'load them all in'

let myTenThousandUsers = realm.objects(UserClass.self)

meh, no big deal. However, doing this

let someFilteredUsers = myTenThousandUsers.filter { $0.blah == "blah" }

will (could) create a problem - if that returns 10,000 users they are all loaded into memory possibly overwhelming the device. That's a Swift function and 'converting' Realms lazy data using Swift should generally be avoided (use case dependent)

The observation of this code using Swift .forEach

realm.objects(Project.self).forEach { (project) in
   // Access fields     
}

could cause issues depending on what's being done with those project objects - using them as a tableView dataSource could be trouble if there are a lot of them.

Second thing is the question about the 16Mb limit per document. For clarity an Atlas document is this

{
   field1: value1,
   field2: value2,
   field3: value3,
   ...
   fieldN: valueN
}

where value can be any of the BSON data types such as other documents, arrays, and arrays of documents.

In your structure, the var tasks = RealmSwift.List<Task>() where Task is an embedded object. While conceptually embedded objects are objects, I believe they count toward a single document limit because they are embedded (correct me if I am wrong); as the number of them grows, the size of the enclosing document grows - keeping in mind that 16Mb of text is an ENORMOUS of text so that would/could equate to millions of tasks per project.

The simple solution is to not embed them and have them stand on their own.

class Task: Object {
    @objc dynamic var _id: String = ObjectId.generate().stringValue
    @objc dynamic var _partition: String = "" 
    @objc dynamic var name: String = ""
    @objc dynamic var status: String = "Pending"
    override static func primaryKey() -> String? {
        return "_id"
    }
}

Then each one can be 16Mb, and an 'unlimited number' can be associated with a single project. One advantage of embedded objects is a type of cascade delete where when the parent object is deleted, the child objects are as well, but with a 1-many relationship from Project to Tasks - deleting a bunch of tasks belonging to a parent is easy.

Oh - another case for not using embedded objects - especially for this use case - is they cannot have indexed properties. Indexing can greatly speed up some queries.

first of all thank you for such an insightful answer. I was not aware of the indexing part that you brought up at the end. Another thing that I plan to experiment is to have a "share project with other users" functionality. Having a task as a top-level collection will help there since I can provide project id as the `_partition` key for the tasks and have users grant access to that partition via sync permission rules. Having it as an embedded object means it inherits the partition and that won't be possible. — Siddharth Kamaria, Mar 15 '21 at 19:25

Modeling sub-collections in MongoDB Realm Sync

1 Answers1

Linked