2

Say, at the beginning of a project, I want to store a collection of Companies, and within each company, a collection of Employees.

Since I'm using a document database (such as MongoDB), my structure might look something like this:

+ Customers[]
   +--Customer
      +--Employees[]
         +--Employee
         +--Employee
   +--Customer
      +--Employees[]
         +--Employee

What happens if, later down the track, a new requirement is to have some Employees work at multiple Companies?

How does one manage this kind of change in a document database?

Doesn't the simplicity of a document database become your worse enemy, since it creates brittle data structures which can't easily be modified?

In the example above, I'd have to run modify scripts to create a new 'Employees' collection, and move every employee into that collection, while maintaining some sort of relationship key (e.g. a CompanyID on each employee).

If I did the above thoroughly enough, I'd end up with many collections, and very little hierarchy, and documents being joined by means of keys.

In that case, am I still using the document database as I should be?

Isn't it becoming more like a relational database?

Jonathan
  • 32,202
  • 38
  • 137
  • 208

2 Answers2

3

Speaking about MongoDB specifically...because the database doesn't enforce any relationships like a relational database, you're on the hook for maintaining any sort of data integrity such as this. It's wonderfully helpful in many cases, but you end up writing more application code to handle these sorts of things.

Having said all of that, they key to using a system like MongoDB is modeling your data to fit MongoDB. What you have above makes complete sense if you're using MySQL...using Mongo you'd absolutely get in trouble if you structure your data like it's a relational database.

If you have Employees who can work at one or more Companies, I would structure it as:

// company records
{ _id: 12345, name : 'Apple' }
{ _id: 55555, name : 'Pixar' }
{ _id: 67890, name : 'Microsoft' }

// employees
{ _id : ObjectId('abc123'), name : "Steve Jobs", companies : [ 12345, 55555 ] }
{ _id : ObjectId('abc456'), name : "Steve Ballmer", companies : [ 67890 ] }

You'd add an index on employees.companies, which would make is very fast to get all of the employees who work for a given company...regardless of how many companies they work for. Maintaining a short list of companies per employee will be much easier than maintaining a large list of employees for a company. To get all of the data for a company and all of it's employees would be two (fast) queries.

Doesn't the simplicity of a document database become your worse enemy, since it creates brittle data structures which can't easily be modified?

The simplicity can bite you, but it's very easy to update and change at a later time. You can script changes via Javascript and run them via the Mongo shell.

brianz
  • 7,268
  • 4
  • 37
  • 44
1

My recent answer for this question covers this in the RavenDb context:

How would I model data that is heirarchal and relational in a document-oriented database system like RavenDB?

Community
  • 1
  • 1