0

I am new to mongodb NoSQL concept and stuck at point where I am unable to take a decision for modelling the schema that could best serve my purpose.

I need to design schema in such a way that I have my end result as Posts and Shares sorted by time. For this I considered two options:


Option 1: Different Collection for Posts and Share as:

Schema for Post collection:

var postSchema = mongoose.Schema({
   postText: String,
   postedBy: String, 
   privacy: Number,
   updatedOn: { type: Date, default: Date.now }        
}, { collection: 'posts' }); 

Schema for Share Collection

var shareSchema = mongoose.Schema({
   dis_Id: { type: mongoose.Schema.Types.ObjectId }, // Id of post that is shared
   shareBy: { type: mongoose.Schema.Types.ObjectId },
   shareText: String,
   share_privacy: Number,
   shareOn: { type: Date, default: Date.now }
}, { collection: 'shares' });

Option 2: Embed Share in Posts itself

New Schema for Post

var postSchema = mongoose.Schema({
  postText: String,
  postedBy: String,
  updatedOn: { type: Date, default: Date.now }, 
  privacy: Number,
  share: {
    shareBy: { type: mongoose.Schema.Types.ObjectId },
    shareText: String, 
    share_privacy: Number,
    shareOn: { type: Date } 
  }       
}, { collection: 'posts' });

Now which of this could be a better choice? Option 1 has problem in querying as there are no joins in mongodb and Option 2 will lead to replication of the same data and can grow up to more than billions for hundreds of thousands of users.

James Z
  • 12,209
  • 10
  • 24
  • 44
  • I'm still not clear as to why 1st option is not viable. I agree that join is not supported in mongodb for online queries. However using this [post](https://stackoverflow.com/questions/5681851/mongodb-combine-data-from-multiple-collections-into-one-how) you can use join for offline processes. How is 1st option defeating the purpose is not clear. Can you elaborate? – SIDDHARTH J MEHTA Sep 17 '17 at 12:14
  • **MapReduce** Technique returns a **BSON document** which can be of maximum **16mb** in size , so my number of posts and number of shares per post can increase and may hit 16mb size of resultant document – Sarthak Patidar Sep 17 '17 at 13:11
  • ok. I agree with you regarding map-reduce usecase. But still I'm not able to understand why 1st option will not be helpful. I mean if you'll be able to define your api requests or use case, it will be more clear. – SIDDHARTH J MEHTA Sep 17 '17 at 13:17
  • @SIDDHARTHJMEHTA Final result obtained will be contain an document which will have an array of **shares** and each of this array can have many documents so likewise – Sarthak Patidar Sep 17 '17 at 13:22
  • @SIDDHARTHJMEHTA. I want to recieve posts and shares of the a user's followers( search for **postedBy** and **shareBy** in both and sort according to **Date**) so I need to look at these too and return the each of them according to time – Sarthak Patidar Sep 17 '17 at 13:44

2 Answers2

0

Ok. I suggest the following approach:

  1. Since you already have the username, you can retrieve the list of post corresponding to that id in a sorted order using sort.

  2. Iterating over each post, you can fetch the shares in sorted order using same sort used above.

The key here is understanding indexes that you'll be setting. I suggest you should have following indexes.

post_schema: compound index on {username, updatedOn}

share_schema: compound index on {dis_Id, shareOn}.

If compound indexes are not used, your application will not scale for large number of records.

  • Here there is one more problem : Although I get **posts** and **shares** respectively in sorted order but I need further sorting with mixture of posts and share. **for eg** : if ( A ) is posted at 9am has been shared ( A1 ) at 3pm whereas B is posted at 12pm. According to ur logic i will get A then A1 than B but I need the order A then B then A1. – Sarthak Patidar Sep 17 '17 at 14:20
  • ok, from what i understand you'll have to write custom logic to merge these shares/post. map-reduce facility of mongo will not be helpful if you are appending all the records in a single row. however if you handle it such that single record of post/share is a single row, this can be done using MR. Concluding, you'll have to write custom logic to do it. – SIDDHARTH J MEHTA Sep 18 '17 at 01:09
0

Well it is easy to work with embedded documents as you get all the required data together so option 2 is good in that case. But if you are concerned about the size of documents increasing more then 16MB then go with option 1. In that case do not use aggregate query to join two collections because that will be time consuming operation became it will first perform all the operations and then perform skip operation. Instead you should query on each collection individually and create a complete response yourself with some custom logic.

Ricky
  • 155
  • 1
  • 7
  • Well Than how do I do this with option 2 keeping this in mind : Here there is one more problem : Although I get posts and shares respectively in sorted order but I need further sorting with mixture of posts and share. for eg : if ( A ) is posted at 9am has been shared ( A1 ) at 3pm whereas B is posted at 12pm. According to ur logic i will get A then A1 than B but I need the order A then B then A1 – Sarthak Patidar Sep 17 '17 at 14:28
  • for that you have to write custom code. Get the documents from both the collection that you want and then merge them in the order you need them. – Ricky Sep 17 '17 at 18:09
  • And how should I merge them ... Using aggregation or any other way ? – Sarthak Patidar Sep 17 '17 at 18:21
  • You have to write code in whatever language you are using nodejs or java. – Ricky Sep 17 '17 at 18:27