4

I am creating a schedule for our engineer to analyze. The schedules are downloaded each day and the analysis is done on the local computers.

So, now, I am in this dilemma of storing the schedule in database as table rows or as nvarchar(max).

Here is the requirement

  1. The schedules are generated each day. Each schedule is accurate to 1 seconds. So, at most, it will contain 86,400 records per schedule.
  2. In a day, depending on the setting, the system can generate up to 100 schedules per engineer (we have a about 10 engineers)
  3. The schedule contains the following fields: INT | INT | INT | INT | NVARCHAR(1024) | NVARCHAR(64) | BIT | BIT | DATETIME | DATETIME (In summary: 4x INTs, 2x NVARCHARs, 2x BITs, and 2x DATETIMEs)
  4. The schedule is rarely going to be updated, but it can be updated. The updatable fields are: 2x BITs and 1x DATETIME.

Now looking at the common case scenario:

In a day, it will generates about 1,296,000 records per day.

This is the calculation of common case scenario:
- 10 seconds accuracy per schedule = 8,640 rows
- 5 engineers run the scheduler each day
- Each engineer generates about 30 schedules

So total is: 8,640 * 5 * 30 = 1,296,000 records

If I store each schedule as NVARCHAR(MAX) with comma delimited, then the number of records are reduced to only 150 records per day.

Here is the calculation:
- 10 seconds accuracy per schedule = 8,640 rows --> stored as NVARCHAR (becomes 1 record)
- 5 engineers run the scheduler each day
- Each engineer generates about 30 schedules

So total is: 5 * 30 = 150 records

Now, this is the requirement for those schedules:

  1. The generated schedules can be viewed on the website.
  2. The schedules is downloaded by the application each day for analysis.
  3. The fields (2x BITs) can be updated once the analysis is completed. These fields can be updated by application (after finish analyzing the schedule) or can be updated (manually) by the engineer on the website.
  4. All generated schedule must be stored for at least 3 months for auditing purposes.

What is your recommendation? Store schedules as table rows OR NVARCHAR(MAX)

gotqn
  • 42,737
  • 46
  • 157
  • 243
Sam
  • 1,826
  • 26
  • 58

1 Answers1

2

Are their any benefits in storing the data in one column other than rows count? If not, as to me, you are save to store the data in normalized manner.


I have used both techniques for storing the data because of different requirements. And of course, storing data in VARBINARY(MAX) or NVARCHAR(MAX) lead to many difficulties:

  • not able to index and search by certain fields
  • in order to perform updates, the data must be normalized, modified and then build as a string/binary again
  • in order to perform reporting, the data must be normalized again

So, because of the above, I will advice to choose the table format. Also, if you are feeling the exporting the data in some kind of serialization is better, you can always implement such SQL CLR string concatenation function or use the built-in if using SQL Server 2017 and latter.

Also, it will be better to use separators like CHAR(31) and CHAR(30) for columns and rows. It is more clear then using tab/new lines/commas/semi-colons as it is unlikely the input data to contain such and break your data.

gotqn
  • 42,737
  • 46
  • 157
  • 243