0

I need to save the result from several measurements generated in C#.
The results are big Lists of data of different types:

List<double, double> [20000 records]
List<string, double, double> [20000 records]
List<int, double, double> [20000 records]

My first thought was to store these lists in XML format into a table (xml field). So, for each List I generate a XML file and insert it into the table.

<MeasurementList>
  <item>
    <left_measurement>0.1264</left_measurement>
    <right_measurement>6.500</right_measurement>
  </item>
  <item>
    <left_measurement>0.2314</left_measurement>
    <right_measurement>6.968</right_measurement>
  </item>
  <item>
    <left_measurement>0.2365</left_measurement>
    <right_measurement>7.598</right_measurement>
  </item>
  ...
</MeasurementList> 

This will occur 3000 times a day, so I need 9000 records each day.

I know that the disk space is an issue, but my main concern is:
Is this is the best aproach or there are any other solutions

nunopacheco
  • 71
  • 3
  • 11
  • You can try to save the data as `Json` instead of `XML`. You can save significant disk space. – Serkan Arslan Nov 16 '17 at 10:49
  • Do you have a reason not to store these values in physical tables and create the XML from `T-SQL` when you need it. XML is - generally spoken - not meant to store big amount of data... – Shnugo Nov 16 '17 at 11:35
  • I think because the size of the table would be huge. In my example I have 20.000 results x 3 Lists = 60.000 lines x 3000 a day, I would get 180 milion lines a day!!! – nunopacheco Nov 16 '17 at 12:14
  • @sarslan I never thought about json... – nunopacheco Nov 16 '17 at 12:15

1 Answers1

2

As your own answer was deleted as a only-link-answer I'll place my comments as answer (and delete them above).

Stuffing millions of values into millions of rows will still need less place than to store the same amount of data nested in about four times the characters you'd need for the pure data.

At least I'd use <lm> instead of <left_measurement>. The shortest in (character) space was

<L v="0.2314"/><R v="6.968"/>
or 
<L>0.2314</L><R>6.968</R>
or 
<v l="0.2314" r="6.968">

The last option would reflect your List<double, double> [20000 records] best...

The main question is: What are you going to do with this after some days? You can store millions of values somehow, but will you ever have to read them again? If not: Why do you store it at all? If yes (and that's what I assume): Don't put this in a string based container! You will get into sever performance troubles...

About performance and disc space:

  • a FLOAT will take 4 bytes, while 0.2314 consumes 6 characters (12 bytes in NVARCHAR!) and you do not need extra space for the meta data!
  • SQL Server will store an XML as a hidden hierachy table with NVARCHAR strings (2 bytes per character!)
  • JSON is still stored as the string you see (If plain latin you can use VARCHAR with 1 byte per character).
  • XML is much faster in queries than JSON (no structure parsing needed)...
  • Both (XML and JSON) store their values string based. Any attempt to read these values will need a type conversion ("0.1234" is not a number, but a string, that looks like a number!) This is not very expensive with a few values, but this will tear down your system with billions of data...

You might read When can I save JSON or XML data in an SQL Table

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • The arrays of data are only for archive. I dont need to query them all the time. I am going to run some tests and take into account your guidelines! – nunopacheco Nov 21 '17 at 17:04