2

I was wandering what keep MongoDB faster. Having a few parent documents with big arrays of embedded documents inside of them or having a lot of parent documents with few embedded documents inside.

This question only regards querying speed. I'm not concerned with the amount of repeated information, unless you tell me that it influences the search speed. (I don't know if MongoDb automatically indexes Id's)

Example:

Having the following Entities with only an Id field each one:

  • Class (8 different classes )
  • Student ( 100 different students )

In order to associate students with classes, would I be taking most advantage of MongoDB's speed if I:

  • Stored all Students in arrays, inside the classes they attend
  • Inside each student, I kept an array with the classes they attend.

This example is just an example. A real sittuation would involve thousands of documents.

F. Santiago
  • 842
  • 10
  • 18
  • 1
    What are you going to query for? Students of a given class? Classes with a given student? Students across classes? Classes sorted by number of students? – Thilo May 28 '12 at 08:38
  • Well, I am going to search for specific students inside a given class. – F. Santiago May 28 '12 at 08:48

1 Answers1

2

I am going to search for specific students inside a given class.

If so, you should have a Student collection, with a field set to the class (just the class id is maybe better than an embedded and duplicated class document).

Otherwise, you will not be able to query for students properly:

db.students.find ({ class: 'Math101', gender: 'f' , age: 22 })

will work as expected, whereas storing the students inside the classes they attend

{ _id: 'Math101', student: [
     { name: 'Jim', age: 22 } , { name: 'Mary', age: 23 }
  ] }

has (in addition to duplication) the problem that the query

db.classes.find ( { _id: 'Math101', 'student.gender': 'f', 'student.age': 22 })

will give you the Math class with all students, as long as there is at least one female student and at least one 22-year-old student in it (who could be male).

You can only get a list of the main documents, and it will contain all embedded documents, unfiltered, see also this related question.

I don't know if MongoDb automatically indexes Id

The only automatic index is the primary key _id of the "main" document. Any _id field of embedded documents is not automatically indexed, but you can create such an index manually.

Community
  • 1
  • 1
Thilo
  • 257,207
  • 101
  • 511
  • 656
  • I am not sure if I understand completely what you are suggesting. It seems you are allowing a Student to only attend one class in the Student document. – F. Santiago May 28 '12 at 09:01
  • No, the student can attend many classes, using an array-type field, just as you proposed in the question: `{ name: 'Jim', gender: 'm', age:23, class: ['Math101', 'Poetry', 'Pottery']}` – Thilo May 28 '12 at 09:03
  • Oh, I understand. But now, in the classes document, you wouldn't have any information relatively to any Student? – F. Santiago May 28 '12 at 09:03
  • Yes, the classes do not know their students. So some other kinds of queries won't be possible. You have to design your data schema according to what queries you want to have, and what data needs to be kept in sync when updating. – Thilo May 28 '12 at 09:10
  • Alrighty Mr Thilo. Thank you for your knowledge. I got it all and thanks for the Virtual Collections link :D – F. Santiago May 28 '12 at 09:17