1

Possible Duplicate:
Database design to create tables on the fly

I have a need to create a database within a database. This question is related to this. I will attempt to explain what i am trying to do in detail and provide examples in code. basically, I want to ability to create dynamic tables in a database. For example, I will have a webpage that allows users to create their own table with columns and data. Here is the database design I came up with:

aColumn
aDataType
aRow
aTable
zBit
zDateTime
zMoney
zNumber
zText

Tables that begin with a z are data that specific data go in such as ints, datetime values, etc. aColumn are columns that belong to a specific table. aRow identifies a particular row within the aTable. Here is the database design:

aTable: Id, name
aColumn: Id, Name, aTable, aDataType
aDataType: Id, Name
aRow: Id, aTable
zBit: Id, aRow, aColumn, Data(Bit)
zDateTime: Id, aRow, aColumn, Data (DateTime)
zMoney: Id, aRow, aColumn, Data (Money)
zNumber: Id, aRow, aColumn, Data (INT)
zText: Id, aRow, aColumn, Data (nvarchar(MAX))

Here is some sample data I used to get it up and running:

aTable

Id          Name
1           Users

aColumns

Id          Name           aTable       aDataType
1           Name           1            2
2           UserId         1            1
3           Occupation     1            2

aDataType

Id          Name
1           Number
2           Text

aRow

Id          aTable
1           1
2           1

aNumber

Id          aRow           aColumn      Data
1           1              1            1245
2           2              2            56

aText

Id          aRow           aColumn      Data
1           1              1            Sara
2           2              1            Jake

All other z* tables are blank

Here is the query to create a dynamic table:

select t.[Id] as [Table], c.Name as [Column], dt.Name as [DataType], r.[Id] as [Row], cast(v.Data as nvarchar(MAX)) as Data from [pod].[dbo].[aTable] t
INNER JOIN [pod].[dbo].[aColumn] c on t.Id = c.[aTable]
INNER JOIN [pod].[dbo].[aDataType] dt on c.[aDataType] = dt.Id
INNER JOIN [pod].[dbo].[aRow] r on t.[Id] = r.[aTable]
INNER JOIN [pod].[dbo].[zBit] v on c.[Id] = v.aColumn and r.[Id] = v.[aRow]
UNION ALL
select t.[Id] as [Table], c.Name as [Column], dt.Name as [DataType], r.[Id] as [Row], cast(v.Data as nvarchar(MAX)) as Data from [pod].[dbo].[aTable] t
INNER JOIN [pod].[dbo].[aColumn] c on t.Id = c.[aTable]
INNER JOIN [pod].[dbo].[aDataType] dt on c.[aDataType] = dt.Id
INNER JOIN [pod].[dbo].[aRow] r on t.[Id] = r.[aTable]
INNER JOIN [pod].[dbo].[zDateTime] v on c.[Id] = v.aColumn and r.[Id] = v.[aRow]
UNION ALL
select t.[Id] as [Table], c.Name as [Column], dt.Name as [DataType], r.[Id] as [Row], cast(v.Data as nvarchar(MAX)) as Data from [pod].[dbo].[aTable] t
INNER JOIN [pod].[dbo].[aColumn] c on t.Id = c.[aTable]
INNER JOIN [pod].[dbo].[aDataType] dt on c.[aDataType] = dt.Id
INNER JOIN [pod].[dbo].[aRow] r on t.[Id] = r.[aTable]
INNER JOIN [pod].[dbo].[zMoney] v on c.[Id] = v.aColumn and r.[Id] = v.[aRow]
UNION ALL
select t.[Id] as [Table], c.Name as [Column], dt.Name as [DataType], r.[Id] as [Row], cast(v.Data as nvarchar(MAX)) as Data from [pod].[dbo].[aTable] t
INNER JOIN [pod].[dbo].[aColumn] c on t.Id = c.[aTable]
INNER JOIN [pod].[dbo].[aDataType] dt on c.[aDataType] = dt.Id
INNER JOIN [pod].[dbo].[aRow] r on t.[Id] = r.[aTable]
INNER JOIN [pod].[dbo].[zMoney] v on c.[Id] = v.aColumn and r.[Id] = v.[aRow]
UNION ALL
select t.[Id] as [Table], c.Name as [Column], dt.Name as [DataType], r.[Id] as [Row], cast(v.Data as nvarchar(MAX)) as Data from [pod].[dbo].[aTable] t
INNER JOIN [pod].[dbo].[aColumn] c on t.Id = c.[aTable]
INNER JOIN [pod].[dbo].[aDataType] dt on c.[aDataType] = dt.Id
INNER JOIN [pod].[dbo].[aRow] r on t.[Id] = r.[aTable]
INNER JOIN [pod].[dbo].[zNumber] v on c.[Id] = v.aColumn and r.[Id] = v.[aRow]
UNION ALL
select t.[Id] as [Table], c.Name as [Column], dt.Name as [DataType], r.[Id] as [Row], cast(v.Data as nvarchar(MAX)) as Data from [pod].[dbo].[aTable] t
INNER JOIN [pod].[dbo].[aColumn] c on t.Id = c.[aTable]
INNER JOIN [pod].[dbo].[aDataType] dt on c.[aDataType] = dt.Id
INNER JOIN [pod].[dbo].[aRow] r on t.[Id] = r.[aTable]
INNER JOIN [pod].[dbo].[zText] v on c.[Id] = v.aColumn and r.[Id] = v.[aRow]

Here is one chunk of this query:

select t.[Id] as [Table], c.Name as [Column], dt.Name as [DataType], r.[Id] as [Row], cast(v.Data as nvarchar(MAX)) as Data from [pod].[dbo].[aTable] t
INNER JOIN [pod].[dbo].[aColumn] c on t.Id = c.[aTable]
INNER JOIN [pod].[dbo].[aDataType] dt on c.[aDataType] = dt.Id
INNER JOIN [pod].[dbo].[aRow] r on t.[Id] = r.[aTable]
INNER JOIN [pod].[dbo].[zText] v on c.[Id] = v.aColumn and r.[Id] = v.[aRow]

As you can see here a piece of data (z* tables) is identified by a row and a column. When I run this query I get this:

Results

Table       Column         DataType     Row           Data
1           UserId         Number       1             1245          
1           UserId         Number       2             56
1           Name           Text         1             Sara
1           Name           Text         2             Jake

Here, are my desired results: (I am not sure how to turn these rows into columns if the columns are unknown)

Row         UserId       Name
1           1245         Sara
2           56           Jake

Big Problem This table is suppose to have 3 columns remember?

aColumns

Id          Name           aTable       aDataType
1           Name           1            2
2           UserId         1            1
3           Occupation     1            2

So my final expected results are:

Row         UserId       Name         Occupation
1           1245         Sara         NULL
2           56           Jake         NULL

In the results I need to sort columns as well. Is this even possible. What databases support this kind of functionality. I am wide open to any database that can do this.

Community
  • 1
  • 1
Luke101
  • 63,072
  • 85
  • 231
  • 359
  • 8
    *I have a need to create a database within a database*. **Why?** What's wrong with simply adding and removing columns and or tables. –  Aug 26 '12 at 15:45
  • 8
    See http://en.wikipedia.org/wiki/Inner-platform_effect and http://thedailywtf.com/Articles/The_Inner-Platform_Effect.aspx. Why are you reinventing the wheel, as opposed to just having your application dynamically create real rows and columns? – Chris Shain Aug 26 '12 at 15:45
  • @ChrisShain - The problem with dynamically creating new rows/columns is how do you write your queries, since every query will need to look up which columns exist, and then dynamically allow people to pick what to retrieve. Basically you are creating the framework for a EAV model in the process. – James Black Aug 26 '12 at 15:52
  • Right- but you need to do that with your approach as well. MySQL, Oracle, and SQL Server all have built-in methods for retrieving the structure of your existing database. EAVs bigger strength, IMHO, is the ability to store sparse data efficiently, which many database engines already do (see SQL Server's column oriented stores) – Chris Shain Aug 26 '12 at 15:56
  • 2
    Welcome to the DB world. I suspect every database developer goes through this phase, until they learn the term "EAV" and start reading about it. However, if you're just playing around, have fun and learn it for yourself the hard way - often this is the best way to learn. Good luck :) – Jeffrey Kemp Aug 27 '12 at 01:50
  • @ChrisShain Thanks; I was looking for those to add to the last post on this topic but couldn't find them. – Craig Ringer Aug 27 '12 at 01:51
  • This is a really convoluted design. Do you actually expect this to scale? I don't see any justification why every user needs its own table. Even if that is the case, you can query the metadata of how a table has been defined (columns and the datatype of the columns). – steve Aug 27 '12 at 04:51
  • Suppose you have a table named MY_TABLE defined in your database-in-a-database. What does SQL equivalent to `SELECT * FROM MY_TABLE` look like? Now - plug in 1,000,000 "rows" into MY_TABLE in your database-in-a-database. First - how long did it take to insert those 1 million rows? Second, what did the `INSERT` equivalent look like? Third, if you want to find one row, something like `SELECT * FROM MY_TABLE WHERE SOME_COLUMN = 123456`, what does the equivalent database-in-a-database query look like, and how long does it take to execute? – Bob Jarvis - Слава Україні Aug 27 '12 at 11:20

2 Answers2

3

You may want to look at an Entity Attribute Value model (EAV) design for your data then.

Basically you can have a table with table names, and some other meta-data on tables.

Then, you can create a table for each of those rows to have the column data, such as datatype and name.

Then, you have a table where you put the values for each column, in a long table.

This allows you to dynamically create tables, or add/remove rows dynamically.

For a comparison on relational and EAV you can look at this question:

Entity Attribute Value Database vs. strict Relational Model Ecommerce

If you want to have a relational view of this data though, then you will need to create triggers to help keep views up-to-date, and this can be a great deal of work to get that to work well. If you don't need a relational view then you should be fine.

Another way to do this is with a NoSQL database (http://en.wikipedia.org/wiki/NoSQL), as the schema doesn't have to be set, and so you can just store the columns you need for that row.

At this point I would go the NoSQL way as there are many databases that can work, and the reinventing you need to do is minimal.

Community
  • 1
  • 1
James Black
  • 41,583
  • 10
  • 86
  • 166
1

For the last part of your question, you're asking how to do a cross-tabulation query against an EAV schema. Some databases support this via extensions to the SQL standard, others don't support it at all. For portability you have to do it in your app. PostgreSQL offers the crosstab function in the tablefunc extension for this.

If you go down the EAV path you will sooner or later regret it. It's useful in certain limited circumstances, but it's a bad fit for the relational model and causes lots of pain and problems, not the least of which is awful performance.

Consider instead:

  • If at all possible, re-design so you don't need dynamic schema. Probably not possible in your case since your express requirement is a user-editable schema for a web-based database app, but in most cases this is the right choice.

  • Dynamically create/drop schema with ALTER TABLE, CREATE TABLE, etc. Some databases are much better at this than others. PostgreSQL's transactional DDL can help a lot. Caution is required to avoid this becoming a performance and maintenance nightmare, but it's probably the sanest option if you're trying to model a relational database with dynamic structure.

  • Key/value stores that are optimised for EAV-like querying; see Key/Value stores. Be careful, many of these systems don't offer full ACID semantics and may have limited query languages, so you can land up doing lots more work in the application.

  • Storing XML or JSON in the database. You can do that with a relational DB, but you're likely to be better off with a document database. Same caveats as for K/V stores apply. This approach works ok if you're doing all the querying logic in your app and your data sizes aren't too big.

  • Using database-specific features like PostgreSQL's hstore to support arbitrary key/value storage where required, and use standard relational design where k/v isn't required. If you want relations as output it's still a major PITA that involves inefficient crosstab queries and joining.

Chris makes a good point: your whole design is very dubious. See: The inner platform effect and TDWTF's take on it. Seriously, don't go there.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778