2

I have been programming relational database for many years, but now have come across an unusual and tricky problem:

I am building an application that needs to have very quick and easily defined entities (by the user). Instances of these entities could then be created, updated, deleted etc.

There are two options I can think of.

Option 1 - Dynamically created tables

The first option is to write an engine to dynamically generate the tables, and insert the data into these. However, this would become very tricky, as every query would also need to be dynamic, or at least dynamically created stored procedures etc.

Option 2 - Entity - Key - Value Pattern

This is the only realistic option I can think of, where I have 5 table structure:

EntityTypes

EntityTypeID int

EntityTypeName nvarchar(50)

Entities

EntityID int

EntityTypeID int

FieldTypes

FieldTypeID int

FieldTypeName nvarchar(50)

SQLtype int

FieldValues

EntityID int

FIeldID int

Value nvarchar(MAX)

Fields

FieldID int

FieldName nvarchar(50)

FieldTypeID int

The "FieldValues" table would work a little like a datawarehouse fact table, and all my inserts/updates would work by filling a "Key/Value" table valued parameter and passing this to a SPROC (to avoid multiple inserts/updates).

All the tables would be heavily indexed, and I would end up doing many self joins to obtain the data.

I have read a lot about how bad Key/Value databases are, but for this problem it still seems to be the best.

Now my questions!

  • Can anyone suggest another approach or pattern other than these two options?
  • Would option two be feasible for medium sized datasets (1 million rows max)?
  • Are there further optimizations for option 2 I could use?

Any direction and advice much appreciated!

Charles
  • 50,943
  • 13
  • 104
  • 142
Paul Grimshaw
  • 19,894
  • 6
  • 40
  • 59
  • After a play around, I have Mongo running with the Mongo/C# library . The noRM library is a little too strongly typed for this particular example, as I will make use of the commands such as db.getCollection("Contacts"). Thanks to pablochan for pointing me in the right direction, and looking forward to a new adventure with noSQL! – Paul Grimshaw Dec 23 '11 at 14:05

4 Answers4

3

Personally I would just use a "noSQL" (key/value) database like MongoDB.

But if you need to use a relational database option 2 is the way to go. A good example of that kind of model is the Alfresco Data Dictionary (Alfresco is an enterprise content management system). It's design is similar to what you describe, although they have multiple columns for field values (for every simple type available in the database). If you add a good cache system to that (for example Ehcache) it should work fine.

pablochan
  • 5,625
  • 27
  • 42
  • Thanks for the comments. I hadn't heard of noSQL databases, and after a brief look MonogDB looks very interesting. I will run through the tutorial and see if I see a way forward with this. Do you know if it scales well, and is applicable in multi user web applications? I am using ASP.NET MVC3 with C#. – Paul Grimshaw Dec 23 '11 at 01:27
  • Unfortunately I don't have "real life" experiences with MongoDB so I can't really tell you. One of the problems is that write operations block the entire db (http://www.mongodb.org/display/DOCS/How+does+concurrency+work), at least at this point, so you should definately read up on it. There are other noSQL databases though, so you can check them out as well. – pablochan Dec 23 '11 at 08:38
2

As others have suggested NoSQL, I'm going to say that, in my opinion, schemaless databases really is best suited for use-cases with no schema.

From the description, and the schema you came up with, it looks like your case is not in fact "no schema", but rather it seems to be "user-defined schema".

In fact, the schema you came up with looks very similar to the internal meta-schema of a relational database. (You're sort of building a relational database on top of a relational database, which in my experience is not a good idea, as this "meta-database" will have at least twice the overhead and complexity for any basic operation - tables will get very large, which doesn't scale well, and the data will be difficult to query and update, problems will be difficult to debug, and so on.)

For use-cases like that, you probably want DDL: Data Definition Language.

You didn't say which SQL database you're using, but most SQL databases (such as MySQL, PostgreSQL and MS-SQL) support some dialect of DDL extensions to SQL syntax, which let you manipulate the actual schema.

I've done this successfully for use-cases like yours in the past. It works well for cases where the schema rarely changes, and the data volumes are relatively low for each user. (For high volumes or frequent schema updates, you might want schemaless or some other type of NoSQL database.)

You might need some tables on the side for additional field information that doesn't fit in SQL schema - you may want to duplicate some schema information there as well, as this can be difficult or inefficient to read back from actual schema.

Ensuring atomic updates to your field information tables and the schema probably requires transactions, which may not be supported by your database engine - PostgreSQL at least does support transactional schema updates.

You have to be vigilant when it comes to security - you don't want to open yourself up to users creating, storing or deleting things they're not supposed to.

If it suits your use-case, consider using not only separate tables, but separate databases, which can also by created and destroyed on demand using DDL. This could be applicable if each customer has ownership of data collections that can't, shouldn't, or don't need to be queried across customers. (Arguably, these are rare - typically, you want at least analytics or something across customers, but there are cases where each customer "owns" an isolated, hosted wiki, shop or CMS/DMS of some sort.)

(I saw in your comment that you already decided on NoSQL, so just posting this option here for completeness.)

mindplay.dk
  • 7,085
  • 3
  • 44
  • 54
  • This answer is spot on. The question was asked 10 years ago, and we went full circle. We built the solution on MongoDb, but about 3 years later we migrated to Postgres, doing exactly as you say - letting the relational database do its work and just dynamically modifying the schema. We ended up building a declarative schema tool, that will take a schema definition and migrate the database automatically. – Paul Grimshaw Sep 12 '21 at 19:50
1

It sounds like this might be a solution in search of a problem. Is there any chance your domain can be refactored? If not - theres still hope.

  • Your scalability for option 2 will depend a lot on the width of the custom objects. How many fields can be created dynamically? 1 million entities when each entity has 100 fields could be a drag... Efficient indexing could make performance bearable.

  • For another option - you could have one data table that has a few string fields, a few double fields, and a few integer fields. For example, a table with String1, String2, String3, Int1, Int2, Int3. A second table with have rows that define a user object and map your "CustomObjectName" => String1, and such. A stored procedure reading INFORMATION_SCHEMA and some dynamic sql would be able to read the schema table and return a strongly typed recordset...

  • Yet another option (for recent versions of SQL Server) would be to store a row with an id, a type name, and an XML field that contains a XML document that contains the object data. In MS Sql Server this can be queried against directly, and maybe even validated against a schema.

Matt Murrell
  • 2,321
  • 2
  • 23
  • 39
0

PErsonally I would take the time to define as many attritbutes as you can ratheer than use EAV for everything. Surely you know some of the attributes. Then you only need EAv for the things that are truly client specific.

But if all must be EAV, then a nosql databse is the way to go. Or you can use a relationsla datbase for some stuff and a nosql database for the rest.

HLGEM
  • 94,695
  • 15
  • 113
  • 186