2

I am currently messing around with some stuff for an idea for a site - where I pretty much want to enable my users to create "Tables" which holds data and then allow them to query over this data (in a less geeky way than writing up SQL Queries and hopefully easier than using excel).

My idea, so far, is to represent this in my database using a couple of tables - have one table representing a table, one table representing columns for the table, having one table that represents each row in a table and finally one that represents values. Something akin to (PSEUDO SQL):

CREATE TABLE 'Tables' (
   Id INT NOT NULL PRIMARY KEY,
   NAME VARCHAR(255)
)

CREATE TABLE 'TableColumns' (
   Id INT NOT NULL PRIMARY KEY,
   TableId INT NOT NULL FOREIGN KEY ON 'Tables',
   NAME VARCHAR(255)
)

CREATE TABLE 'TableRows' (
   Id INT NOT NULL PRIMARY KEY,
   TableId INT NOT NULL FOREIGN KEY ON 'Tables',
   RowNumber INT NOT NULL
)

CREATE TABLE 'TableValues' (
   RowId INT NOT NULL PRIMARY KEY,
   ColumnId INT NOT NULL PRIMARY KEY,
   Value VARCHAR(255)
)

(note that the TableValues table has 2 primary key fields here, it's supposed to represent a "composite" primary key, don't bother too much with the fact that my syntax is not legal SQL, it's just supposed to show the idea).

I did a bit of testing with this and was able to successfully do simple querying (simple filtering, ordering and so forth). My way of doing this was to first query the TableRows table - for filtering I then filtered out the rows who's columns did not match the criteria, and for sorting i sorted the RowIds based on their column's content (as specified by the sorting specified). Resulting in a list of Row Ids in the desired order, from here on it was merely just to select what was needed.

All this works fine, but I am a bit stuck from here on. I'd like to somehow be able to represent different data types (which is my main issue really) and also later on work out how to do joins.

While thinking all this through I start to wonder if there's a better way of doing this. Note that performance here, of course, is a factor, but I'm not planning on supporting virtual tables with hundreds of thousands of rows, maybe about 1000 rows per virtual table - of course the entire system needs to be capable of handling many of these.

I know I could always just actually create tables in my database with queries created on the fly in C# to accomplish this, and likewise query using just SQL Queries - however I have never been a huge fan of letting users "construct" queries against my database like this - and it seems to me as if that would lead down a path where many bugs would appear - and in worst case scenario end up allowing the user to kill the database in one way or another.

Also, then my issue becomes how I can deal with this in a way that would make sense from a C# perspective. So far I think I am leaning on using LINQ and then create my own extension methods which would apply the needed functionality - that is ExtensionMethods extending IQueryable.

So what I'd really like would be some ideas of how this could be done, ideas of how to tune performance, ideas of how to deal with separate data types in the table (of course store the type in the tablecolumn, but how to actually store the value so I can filter, sort and so forth by it? - without just adding a "TextValue", "MoneyValue" and so forth column on my tablevalues table). And last but not least, hopefully some good discussions here - I at the very least consider this to be somewhat an interesting topic.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
kastermester
  • 3,058
  • 6
  • 28
  • 44
  • 14
    I am voting you up as an incentive to send this to http://www.thedailywtf.com – TheTXI May 28 '09 at 16:27
  • 1
    As TheTXI says, we see this kind of "schema" all the time on The Daily WTF. It's an incredibly bad idea. And hilarious. – Welbog May 28 '09 at 16:28
  • 1
    I did have an idea that this would be a bad schema, which is exactly why I posted here - to get some ideas on how to improve the idea. I care for what I can accomplish, not for whether or not it is done in this way or any other way. And also, I care for learning :). – kastermester May 28 '09 at 16:32
  • 3
    You should really just stick to the original idea of creating the tables themselves. You can always add in security constraints so that certain users only have access to certain tables and the like. It would be a lot less WTFy than this. – TheTXI May 28 '09 at 16:35
  • 3
    @TheTXI: Agreed, this is completely reinventing CREATE TABLE. – Pesto May 28 '09 at 16:39
  • @TheTXI - in regards to security constraints you mean that I create a user account in my SQL Database per user that needs these virtual tables, right? – kastermester May 28 '09 at 16:40
  • 5
    Please, no need to be dismissive. This is a valid question which warrants discussion. There are some categories of products that require that degree of flexibility, and coming up with a good design for it is not that simple (when diving into the details). – Denis Troller May 28 '09 at 17:04
  • Buzz Kill: See Above... – Matthew Whited May 28 '09 at 18:41

9 Answers9

9

For some reason, everybody encounters that idea at some point or another.

It seems right, it should work.

It would. Sort of.

The comments about TheDailyWTF have a point. Re-implementing a DBMS on top of a DBMS is really not a good idea. Going meta like that is going to give you

  • an underperfoming system
  • a maintenance nightmare

If you really need that kind of flexibility (do you ?), you time would be much better spent implementing the layer that allows you to store metadata in some tables and generate the schema for the actual tables in the database.

There are a few examples of this kind of system that I know of:

  • Microsoft OSLO (specifically the Repository system)
  • the ASAM-ODS server architecture (look for the ASAM-ODS package)

And I am sure there are others.

The up-side of that kind of design is that your database actually makes sense in the end, and it uses the RDBMS for its strength. Also since that kind of configuration should not happen all the time once a table has been created, it allows the user to fine-tune the database if needed (in terms of indexing mainly).

I really strongly feel the only correct answer to the kind of system you propose is don't.

Denis Troller
  • 7,411
  • 1
  • 23
  • 36
  • Hi Denis, I would like to ask you a few question about ASAM, if you've got a bit of time for me, that would be nice. Don't hesitate to join the [ASAM room](http://chat.stackoverflow.com/rooms/81001/asam) – Thomas Ayoub Jun 19 '15 at 11:28
6

It's an interesting idea, but using SQL in this way is probably going to be very painful over time.
If I get you correctly, you want users to be able to define data structures and then save data to those structures. You also want to be able query it. I think there are maybe a couple of other ways to go about this;

  • What about using XML? Allow each user to store an XML file per "table" and just maintain it's schema. Each "row" would be an XML element with child elements. You could optionally stick the XML in SQL or just store it by some other means. This wouldn't work very well with large datasets, but for thousands of records it is surprisingly fast; I did some tests with 20+MB XML files in C# and was able to create them, read them and parse them in less than 1 second. Using LINQ to XML you could even construct fairly sophisticated queries and joins. I wouldn't use XML for a large enterprise system but you'd be surprised how far it will go on modern machines with lots of memory and fast processors - and it is infinitely flexible.
  • Could you use an object-oriented database (Matisse etc) instead? I haven't got experience of these myself, but I think you can quite easily do something like the XML approach but with better performance.
  • Amazon Simple DB: If I remember correctly this is essentially a name/value pair-based database which you can use. Could your application use that in the background instead to avoid you having to deal with all the plumbing? If you had to pay for SQL Server then Amazon DB might be cheaper and has the muscle to scale large, but without some of the things like relational queries.
flytzen
  • 7,348
  • 5
  • 38
  • 54
  • You've put up some very interesting ideas here, and I cannot believe I didn't think about XML before, I will most definatly look into that as well. – kastermester May 28 '09 at 16:55
  • Good idea about the Key/Value pair system. I'm not so much with you on the XML though, especially because kastermester mentionned supporting a potentially pretty big number of rows. Using XML will require to load/save the entire file for every little update, which might end up being prohibitive. – Denis Troller May 28 '09 at 16:59
  • For large sets, XML would be prohibitive, but for the 1000/2000 records, probably fine; What I found is that C# is incredibly fast at dealing with XML. My test program will read a 20MB XML file (10,000 "records"), evaluate each node and output it to a CSV file in less than 0.4s on my laptop. I would estimate half the time is actually spent writing the CSV file. I think SQL has some built-in handling for querying inside XML "files" which may be faster. – flytzen May 28 '09 at 17:38
  • You can store each record as a single value in a table with an ID of an object type and a secondary index table. You then get the column flex from XML and the indexing from the DBMS – Matthew Whited May 28 '09 at 18:43
3

The design you show in your original question is a variation of the Entity-Attribute-Value design.

The impatience some people are expressing is probably due to the fact that practically every database developer "discovers" this design and tries to use it for exactly what you're using it for -- a flexible system to support an extensible database without the inconvenience of using CREATE TABLE and ALTER TABLE.

But EAV has a lot of downside. Here's just one: how would you make any given column mandatory (the equivalent of a NOT NULL constraint)?

A relational database assumes that you know the tables up front, and you can define them. RDBMS isn't the right tool for fully dynamic relations or relations with fully variable sets of attributes. There are a number of other technologies for that, such as XML, RDF, or CouchDB.

See also my answer to "The Next-gen Databases."

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
2

At the risk of a bunch of downvotes, why not just have them install MS Access?

belgariontheking
  • 1,351
  • 1
  • 10
  • 15
  • 1
    This is intended as a Web Application, not a Desktop app - and why not make a webpage that tells people to use MS Access? - Because my mom wouldn't, my sister wouldn't, and honestly no one I know that doesn't think computers are facinating would. – kastermester May 28 '09 at 16:31
  • 2
    @kaster: fair enough, but I'll keep my answer up anyway, as it might help someone come up with a better answer for you. – belgariontheking May 28 '09 at 16:33
  • @Rich B - I am sure both you and I, and everyone else here, could achieve exactly what we wanted using the spreadsheet in Google Docs, Microsoft Office, Open Office and so forth - but believe it or not, most people out there cannot. I am not setting forth to make a gigantic look-alike to those sorts of things, but rather a small subset of the general querying that we use daily, to allow "normal" people to use this sort of stuff too. – kastermester May 28 '09 at 16:35
  • @belgariontheking - yes please leave it, it was a fair comment/question/answer. – kastermester May 28 '09 at 16:36
  • You could use SQL Express and just create a new database for each person. Then allow them to do whatever they want in that database. – Matthew Whited May 28 '09 at 18:44
  • Or just table prefixes like you would do in MySQL or Oracle – Matthew Whited May 28 '09 at 18:44
1

I've run into an approach like this in Microsoft's Business Contact Manager add-in for Outlook. The way they handle field types is to have a table defining the type for each field, then they store the actual field values in a table containing nothing but Varbinary columns. The conversion to/from varbinary is controlled by the field type table.

dsteele
  • 1,282
  • 1
  • 14
  • 25
1

I'm not sure why all the hate and nobody's actually trying to answer your question. Even if it's ultimately a "how would you implement Google Docs" question, it's still a fair question.

Assuming you've thought through whether you really want to do this (what I'm assuming all the hate is about), here's at least a better idea:

DATA_TABLE {
    TABLEID INT,
    INT1 INT,
    INT2 INT,
    VARCHAR1 VARCHAR(255),
    ... etc
}

Then have metadata stored somewhere that records what a specific TABLEID is named, what fields are used & their user-facing names are, etc. Joins are easy to support as they're just self joins on the data table. Whether you store said metadata in the database or elsewhere is up to you.

This approach can work perfectly fine (I can attest that many successful .com sites used this kind of schema, both that I was involved with and/or knew people who were), but to get optimal performance I'd recommend using CHAR over VARCHAR, aligning your tuples so they fit on pages well, etc.

I think this is fairly (very) similar to dsteele's answer

Matt
  • 2,001
  • 2
  • 17
  • 15
  • Thanks Matt, that looks like an interesting and promising idea - I will look into that. – kastermester May 28 '09 at 16:50
  • It's worth noting that every design has tradeoffs. Other people suggested just creating tables. That introduces naming conflicts unless you create schemas/users per user. Depending on how many (stale) users & tables you'll end up with, there's a maintenance cost associated with every design – Matt May 28 '09 at 16:53
  • I'm not sure I understand this answer? If everything is mapped to my data_table, how do I know how many fields I have, what types, etc. Unless you're recommending a data_table per user/use-case? – Chris K May 28 '09 at 17:48
1

I did something for a system once that stored data in a very generic and therefore extremely similar way.

The system was write few and read many and the complex joins to get the data out made the system quite slow.

Now I know this goes against all database practice, but, I denormalised the data for each "Table" as you have it and physically created a table in the database called Table_1, Table_2.

I created and dropped the Table_1 and Table_2 based on triggers on the Tables table, I added and dropped columns to these tables with triggers on the TableColumns table, inserted and deleted rows with triggers on the TableRows table and updated values with triggers on the TableValues table.

Yes, as expected, our performance on writes was greatly reduced, but our performance on reads (which was extremely important at the time) was greatly improved as we could read directly from these "denormalised" tables.

So basically you could actually create and drop the tables in your example and add and remove the columns accordingly. You can create the columns of the actual data types you need and then you can store whatever you want in them.

Might make the queries that you write a lot easier as well as you're only looking at a single table for your user.

Robin Day
  • 100,552
  • 23
  • 116
  • 167
0

You definitely want to simplify this - allow your users to create tables, perhaps in a TEMPDB. Simply prepending a schema suffix to the "CREATE TABLE"d tables might allow you to avoid having them bollox up REAL system tables. Makes it easy for you to filter them as well. Problem is most databases won't allow non-dba's to create tables, so it requires some up-front setup. You could even create a schema in your database specifically for this, and put constraints on size and table-size limitations if your database supports them.

Don't reinvent the wheel.

Do it this way, and you automatically get joins, unions, etc and interfaces with Crystal Reports and other tools that don't require grokkin' your specific schema.

Chris K
  • 11,996
  • 7
  • 37
  • 65
0

SharePoint does this and more on top of SQL Server. I would investigate how SharePoint uses SQL Server.

Matt Spradley
  • 7,854
  • 9
  • 31
  • 40