2

My situation is kind of hard to explain, but I will try.

I have, say, 50 meters which each output some value every minute and I have currently built a table like so that every minute there are 50 rows added to a table for every meter like so:

50 rows like this every minute:

id, datetime, meternumber, metervalue

Then I have another table that has:

id, meternumber, metername

So that I can join the meternames for each meternumber. Basic stuff.

However, this is quickly becoming a problem, because 50 rows every minute is a huge amount of rows. I also have to do some averaging and summing of the meter values and that is not easy nor it is efficient, because sum and average do not seem to work well when I have to like sum every 50th row for each of the 50 rows, as I want to sum/average the metervalue for each meter.

So I was thinking, should I instead make 50 columns like so:

id, datetime, meter_1_value, meter_2_value, ..., meter_n_value

Which would basically divide the amount of rows by 50 and make summing a lot easier, because now I can just sum every row to get:

id, datetime, meter_1_value_summed, meter_2_value_summed, ..., meter_n_value_summed

So is this a good idea to basically turn those 50 rows into columns of the table? I also tried doing this in query-time by first making a temporary table that turns the rows into columns and then I query that table, but as there are already like 15 million rows, it is getting slower and slower.

Indexing, partitioning and increasing the server specs did not do much, so I am starting to run out of ideas.

I know you guys want to see actual table data for some reason, so here is an example:

DateTime TagName Value
-------- ------- -----
2016-06-02 16:24:51 meter_1_name 66.232818603515625
2016-06-02 16:24:51 meter_2_name 42.3612060546875
2016-06-02 16:24:51 meter_3_name 25.111988067626953
2016-06-02 16:24:51 meter_4_name 4.296875
2016-06-02 16:24:51 meter_5_name NULL
2016-06-02 16:24:51 meter_6_name 3.5083911418914795
2016-06-02 16:24:51 meter_7_name 46.137149810791016
2016-06-02 16:24:51 meter_8_name 71.419265747070312
2016-06-02 16:24:51 meter_9_name 68.337669372558594
2016-06-02 16:24:51 meter_10_name 3.1090855598449707
2016-06-02 16:24:51 meter_11_name 3.0222799777984619
2016-06-02 16:24:51 meter_12_name 2.3900461196899414
2016-06-02 16:24:51 meter_13_name 44.856769561767578
2016-06-02 16:24:51 meter_14_name 64.431419372558594
2016-06-02 16:24:51 meter_15_name 34.657115936279297
2016-06-02 16:24:52 meter_1_name 66.232818603515625
2016-06-02 16:24:52 meter_2_name 42.3612060546875
2016-06-02 16:24:52 meter_3_name 25.111988067626953
2016-06-02 16:24:52 meter_4_name 4.296875
2016-06-02 16:24:52 meter_5_name NULL
2016-06-02 16:24:52 meter_6_name 3.5083911418914795
2016-06-02 16:24:52 meter_7_name 46.137149810791016
2016-06-02 16:24:52 meter_8_name 71.419265747070312
2016-06-02 16:24:52 meter_9_name 68.337669372558594
2016-06-02 16:24:52 meter_10_name 3.1090855598449707
2016-06-02 16:24:52 meter_11_name 3.0222799777984619
2016-06-02 16:24:52 meter_12_name 2.3900461196899414
2016-06-02 16:24:52 meter_13_name 44.856769561767578
2016-06-02 16:24:52 meter_14_name 64.431419372558594
2016-06-02 16:24:52 meter_15_name 34.657115936279297
2016-06-02 16:24:53 meter_1_name 66.232818603515625
2016-06-02 16:24:53 meter_2_name 42.3612060546875
2016-06-02 16:24:53 meter_3_name 25.111988067626953
2016-06-02 16:24:53 meter_4_name 4.296875
2016-06-02 16:24:53 meter_5_name NULL
2016-06-02 16:24:53 meter_6_name 3.5083911418914795
2016-06-02 16:24:53 meter_7_name 46.137149810791016
2016-06-02 16:24:53 meter_8_name 71.419265747070312
2016-06-02 16:24:53 meter_9_name 68.337669372558594
2016-06-02 16:24:53 meter_10_name 3.1090855598449707
2016-06-02 16:24:53 meter_11_name 3.0222799777984619
2016-06-02 16:24:53 meter_12_name 2.3900461196899414
2016-06-02 16:24:53 meter_13_name 44.856769561767578
2016-06-02 16:24:53 meter_14_name 64.431419372558594
2016-06-02 16:24:53 meter_15_name 34.657115936279297

And I was thinking of turning that into this:

DateTime meter_1_value meter_2_value meter_3_value meter_4_value
-------- ------------- ------------- ------------- -------------
2016-06-02 16:24:51 66.2328186035 42.36146875 21.111986762693 5.29687
2016-06-02 16:24:52 70.2328186035 43.36146875 22.111988062695 2.29685
2016-06-02 16:24:53 80.2328186035 40.36120465 23.111988762653 8.29675
2016-06-02 16:24:54 90.2328186035 49.36120685 24.111986762693 5.29875

So as you can see, there would be A LOT less rows and sum/average can be done way easier this way. Figuring which value belongs to which meter in this case will not be a problem.

Edit: The row->column query is hacky and looks like this:

DROP VIEW IF EXISTS v_temp;

CREATE OR REPLACE VIEW v_temp AS
(
    SELECT m.datatime, 

    MAX(IF(metername = 1, metervaluevalue, null)) as "meter1",
    MAX(IF(metername = 2, metervaluevalue, null)) as "meter2",
    MAX(IF(metername = 3, metervaluevalue, null)) as "meter3"

    FROM meters m

    WHERE m.datatime >= CAST("2016-05-09 00:00:00" AS DATETIME)
    AND m.datatime <= CAST("2016-05-11 23:59:00" AS DATETIME)

    GROUP BY datatime
);

SELECT datatime, 

ROUND(AVG(meter1), 0) as meter1_avg, 
ROUND(AVG(meter2), 0) as meter2_avg,
ROUND(AVG(meter3), 0) as meter3_avg

FROM v_temp

GROUP BY DATE(datatime), HOUR(datatime), MINUTE(datatime)
ORDER BY datatime ASC
Swiffy
  • 4,401
  • 2
  • 23
  • 49
  • I'd probably keep the current design. – jarlh Jun 02 '16 at 13:47
  • 1
    In my opinion the more advantages to storing as more rows (i.e. In Normalized Form) depends on design. http://stackoverflow.com/questions/9774715/mysql-multiple-tables-or-one-table-with-many-columns – Hamza Zafeer Jun 02 '16 at 13:53
  • 1
    27 million rows *per year* doesn't sound like an unreasonable amount of data. How many years is this going to be running? Dividing that by 50 is a *tiny* linear change to the growth, it will barely make a difference compared to the added complexity in the system. I'd focus more on the performance tuning than on a single-order-of-magnitude data reduction. – David Jun 02 '16 at 13:54
  • @David Then again, 27 million as opposed to 27 million / 50 = 540k / year? I could run that one 27 years without problems then? Adding new meters I not a problem I think. I can just write some tool to add a new column to the table or something. It's the summing and averaging that becomes a problem, really. It is quite hard to do that if I have each meter as a row. I cannot group by meter, because I have to group by datetime to get, for example, hourly average. – Swiffy Jun 02 '16 at 14:14
  • 1
    The whole point of a relational database is to store data in [normalized](https://en.wikipedia.org/wiki/Database_normalization#Normal_forms) form. By pivoting rows into columns, you're de-normalizing your layouts. – Mr. Llama Jun 02 '16 at 14:30
  • @Mr.Llama Yeah, I figured something like that, did't know there is a word for it though! I mean the row solution that I currently have is how I would generally do everything, but in this case, it seems to make querying that table very hard and slow. I have indexes, partitioning and everything, but the queries can take up to a minute or two for some reason. I am also monitoring the CPU and RAM of the server while querying, but nothing suspicious going there either. – Swiffy Jun 02 '16 at 14:40

3 Answers3

1

So is this a good idea to basically turn those 50 rows into columns of the table?

In your position I would keep the existing structure and add a summary table that maintains the number of records for each meter and also the sum. I would not bother to keep an average because that can be calculated quite easily from the count and the sum.

Which would basically divide the amount of rows by 50 and make summing a lot easier, because now I can just sum every row to get:

id, datetime, meter_1_value_summed, meter_2_value_summed, ..., meter_n_value_summed

So is this a good idea to basically turn those 50 rows into columns of the table? I also tried doing

I believe that this would not give you a definite advantage because you would be doing a full table scan and calculating for all 50 columns. That could turn out to be much slower than the current summation.

Using an index as suggested by Gordon will help you get the sum and average for a single meter but if you would need to sum and average for all the meters you would still be reading the full table. Slow.

The summary table.

My proposed summary table would be something like this

meter_number, num_records, summation.

You would be updating this table with the aid of a trigger so that calculation is a trivial addition. Retrieval of the sum and average is a trivial query, you need to read only 50 records. No calculations except for summation/num_records.

e4c5
  • 52,766
  • 11
  • 101
  • 134
  • Well this is a very good looking solution indeed! So when 50 new rows get added, I would query the summary table and just increase the num_records by 1 for each meter and add the new meter value to the summation for each meter? And when I want the average for each meter, that is just `summation/num_records`, right? – Swiffy Jun 02 '16 at 14:50
  • 1
    yes, that's what it is. Turns out that you might not even need a trigger if you are in full control of how the data is inserted. – e4c5 Jun 02 '16 at 14:54
  • Perfect! I would probably still prefer a trigger, so I don't have to care about this, no matter where the insertion might happen. Out of sight, out of mind :) I think this one solves the problem and a great thing to remember in the future - this is a bit advanced from a normal database usage. – Swiffy Jun 02 '16 at 14:57
0

This can be a tricky design issue. The current design has certain advantages:

  • It is easy to insert or delete new meters.
  • If one of the meters isn't available (for some reason), a row is simply missing from the database.
  • Aggregation for a select group of meters is pretty easy.

Your proposed design involves denormalizing the data. This also has some advantages:

  • The number of rows and size of the data is typically smaller.
  • It is easier to feed the data to external analysis tools.

15 million rows is not particularly large. And, a load of less than one row per second is not a large load on the database. You should be able to get the current version to work -- using appropriate indexes and perhaps partitioning.

In particular, summing the values for a particular meter should be fast. So a query such as:

select sum(value), avg(value)
from t
where meternumber = 1;

Should be quite fast with an index on (meternumber, datetime, value).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

If converting the meters to the columnar format works better for you and gives you the results you need, then by all means go for it! Tables with a couple hundred columns are no problem for any RDBMS any more than tables with millions of rows; MySQL is built to scale in either direction. The key is to use whatever approach gives you the best results with the least maintenance, and the columnar approach looks like a good solution for tis application.

Michael Sheaver
  • 2,059
  • 5
  • 25
  • 38