2

I am using SQL Server 2005 Express and Visual Studio 2008.

I have a database which has a table with 400 Columns. Things were (just about manageable) until I had to perform bi-directional sync between several databases.

I am wondering what arguments are for and against using 400 column database or 40 table database are?

The table in not normalised and comprises of mainly nvarchar(64) columns and some TEXT columns. (there are no datatypes as it was converted from text files).

There is one other table that links to this table and is a 1-1 relationship (i.e one entry relates to one entry in the 400 column table).

The table is a list files that contained parameters that are "plugged" into a application.

I look forward to your replies.

Thank you

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Belliez
  • 5,356
  • 12
  • 54
  • 62
  • So each user has a row in the table with properties set that govern how their instance of the application operates? So when you add a new area of functionality you have to add a bunch of columns and populate them with suitable defaults for each user? Why did you have to do the bi-directional sync? Is this the only trouble you had? What other activities are making this unmanageable? – Trevor Tippins Nov 19 '09 at 11:43
  • There is one central server and several client servers all running a local copy of the data. The clients need to merge any changes tot he central database. The server also needs to merge any changes made by the clients. Using the sync framework and following the examples for c# I had to specify all the columns that need synchronising. I also had to create some stored procs that also needed all the columns in code in several places. It got really unmanageable from then on! – Belliez Nov 19 '09 at 11:50
  • So, if understand correctly, a bunch of ini files was stored into DB. Are all 400 columns populated in each row, or mostly NULLS? Is it one row per user, or per what? Is this setup for one app or many random apps? – Damir Sudarevic Nov 19 '09 at 23:26
  • There are multiple machines in a factory. All store their configuration data in text file. Each Machine in the factory can have multiple txt files for each type of job the machine needs to run. Now these machines are being networked to a central server that contains a SQL server database to allow a central area. So these txt files are being converted into the database. The tx files contains around 400 lines (parameter name and value on each line) hence the 400 column table. I am trying to find a good way to do this in SQL server which doesnt need coding 400 columns names in sql etc! – Belliez Nov 20 '09 at 09:44
  • Ok, got it now, promise a model within few hours. I spent years in manufacturing too :) – Damir Sudarevic Nov 20 '09 at 14:01

2 Answers2

2

For having a wide table:

  • Quick to report on as it's presumably denormalized and so no joins are needed.
  • Easy to understand for end-consumers as they don't need to hold a data model in their heads.

Against having a wide table:

  • Probably need to have multiple composite indexes to get good query performance
  • More difficult to maintain data consistency i.e. need to update multiple rows when data changes if that data is on multiple rows
  • As you're having to update multiple rows and maintain multiple indexes, concurrent performance for updates may become an issue as locks escalate.
  • You might end up with records with loads of nulls in columns if the attribute isn't relevant to the entity on that row which can make handling results awkward.
  • If lazy developers do a SELECT * from the table you end up dragging loads of data across the network, so you generally have to maintain suitable subset views.

So it all really depends on what you're doing. If the main purpose of the table is OLAP reporting and updates are infrequent and affect few rows then perhaps a wide, denormalized table is the right thing to have. In an OLTP environment then it's probably not and you should prefer narrower tables. (I generally design in 3NF and then denormalize for query performance as I go along.)

You could always take the approach of normalizing and providing a wide-view for readers if that's what they want to see.

Without knowing more about the situation it's not really possible to say more about the pros and cons in your particular circumstance.

Edit:

Given what you've said in your comments, have you considered just having a long & skinny name=value pair table so you'd just have UserId, PropertyName, PropertyValue columns? You might want to add in some other meta-attributes into it too; timestamp, version, or whatever. SQL Server is quite efficient at handling these sorts of tables so don't discount a simple solution like this out-of-hand.

Trevor Tippins
  • 2,827
  • 14
  • 10
  • I did consider this which led to the following question yesterday: http://stackoverflow.com/questions/1755562/sql-syntax-create-a-single-table-with-column-names-created-from-data-stored-over And things started to get complicated! – Belliez Nov 19 '09 at 14:26
  • I did try the long and skinny list approach but as the properties are all different data types it would be difficult to sort and make it usable. – Belliez Nov 24 '09 at 11:51
  • How about having a side table that records the names and types of properties (plus any other constraints like ranges, isNullable, etc.), then store all the actual properties as varchar/text and have the consumer interpret them as necessary (you might want to perhaps have both number and varchar/text columns in your name=value table but that's an implementation detail). – Trevor Tippins Nov 24 '09 at 13:16
2

Based on your process description I would start with something like this. The model is simplified, does not capture history, etc -- but, it is a good starting point. Note: parameter = property.


- Setup is a collection of properties. One setup can have many properties, one property belongs to one setup only.
- Machine can have many setups, one setup belongs to one machine only.
- Property is of a specific type (temperature, run time, spindle speed), there can be many properties of a certain type.
- Measurement and trait are types of properties. Measurement is a numeric property, like speed. Trait is a descriptive property, like color or some text.

machine_model_01

Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
  • wow, thanks for your efforts. I will have this design in mind when I go about the daunting task of re-arranging my tables this week... Thanks again. it is very much appreciated. – Belliez Nov 23 '09 at 16:31
  • one question, if my system had various datatypes for each property would this be a new table i.e. Trait=descriptive, Measurement=float and a new Table called Counter stores (int) datatypes etc? I am a little uncertain on this. – Belliez Dec 04 '09 at 10:50
  • Few options here, option 1 - as you described, option 2 - use int(bigint) for datatype in `measurement` and add a field for format, convert to decimal on read. This one is equivalent to storing data in smaller units. – Damir Sudarevic Dec 04 '09 at 11:55
  • thanks for your reply. The datatypes I am using are BIT, FLOAT, NVARCHAR(255), NVARCHAR(MAX), INT. I like Option 2; keeping to single table. Do you suggest a table structure like this: Value BIGINT, Format NVARCHAR(20), Unit NVARCHAR(20) and if I need to store a decimal or float read back the Format nvarchar, determine its datatype and cast appropriately in my software (.Net) or in SQL Server? thanks again for your help and advice, its much appreciated. – Belliez Dec 04 '09 at 14:49