1

I have a MySQL database holding memory data and timestamps. Pretty simple data like memory used and memory total available in the system. Now I would like to create a MySQL VIEW after making some simple calculations with this data, to achieve some level of data smoothing (using a rolling window for averages).

The initial table looks like this:

id |date                     |mem_used    |mem_total
1  |2012-03-16 23:29:05      |467         |1024
2  |2012-03-16 23:30:05      |432         |1024
3  |2012-03-16 23:31:05      |490         |1024
4  |2012-03-16 23:33:05      |501         |1024
5  |2012-03-16 23:35:05      |396         |1024
6  |2012-03-16 23:39:05      |404         |1536
7  |2012-03-16 23:43:05      |801         |1536

The created VIEW should look like the following one:

id |date                     |mem_used    |mem_total    |mem_5_min_avg    |mem_rate_usage
1  |2012-03-16 23:29:05      |467         |1024         |473              |0.46191406
2  |2012-03-16 23:30:05      |432         |1024         |455              |0.44433594
3  |2012-03-16 23:31:05      |490         |1024         |463              |0.45214844
4  |2012-03-16 23:33:05      |501         |1024         |449              |0.43847656
5  |2012-03-16 23:35:05      |396         |1024         |396              |0.38671875
6  |2012-03-16 23:39:05      |404         |1536         |603              |0.39257813
7  |2012-03-16 23:43:05      |801         |1536         |801              |0.52148438

Requirements:

The first 3 columns are the same, but the column mem_5_min_avg should contain the average used memory for the following 5 minutes, given that the mem_total is the same (mem_total is changing).

So the following rows should be calculated as follows:

  • 1st row of the mem_5_min_avg column (467+432+490+501)/4 = 1890/4 = 472.5 = 473 <- We sum up 4 rows here because 2012-03-16 23:29:05 plus 5 minutes 2012-03-16 23:34:05
  • 2nd row of the mem_5_min_avg column (432+490+501+396)/4 = 1819/4 = 454.75 = 455
  • 3rd row of the mem_5_min_avg column (490+501+396)/3 = 1387/4 = 462.33 = 463
  • 4th row of the mem_5_min_avg column (501+396)/2 = 897/2 = 448.5 = 449
  • 5th row of the mem_5_min_avg column 396 <- We do not sum any rows here because even if the next measurement is within 5 minutes, the mem_total has changed.
  • 6th row of the mem_5_min_avg column (404+801)/2 = 1205/2 = 602.5 = 603
  • 7th row of the mem_5_min_avg column 801

After the mem_5_min_avg is calculated, I need mem_rate_usage column which shows a simple rate of how much memory is used given in percentage.

mem_rate_usage = mem_5_min_avg / mem_total

For example the 3rd row of mem_rate_usage should be calculated like 463/1024=0.45214844, while the last column should be calculated like this 801/1536=0.52148438

I don't have a clue on how to approach this. I have tried the "AVG" function in combination with "GROUP by", but I don't actually want to group anything here. I want to have the same number of rows and data in the created view, with the smoothed data and rates in addition.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Vangelis Tasoulas
  • 3,109
  • 3
  • 23
  • 36
  • You really should consider using Round Robin DB for this (a.k.a rrdtools http://oss.oetiker.ch/rrdtool/) – Itay Moav -Malimovka Mar 17 '12 at 01:19
  • This question may help: http://stackoverflow.com/questions/664700/calculate-a-running-total-in-mysql. On second thought maybe not... – Simon Mar 17 '12 at 01:50

2 Answers2

0

UPDATE 2:

Improved the query further but still slow. I realized that TIMESTAMPDIFF is much slower than a direct comparison between UNIX_TIMESTAMP. So changing the code of the UPDATE 1 like this we get almost 20% speed improvement.

Increasing also the innodb_buffer_pool_size option in my.cnf helped to increase the speed a lot.

SELECT  `date` ,  `mem_used` ,  `mem_total` , `mem_5_min_avg` , 
(`mem_5_min_avg` / `mem_total`) AS mem_usage_rate
FROM (
   SELECT *, (
      SELECT CEILING( AVG( mem_used ) )
      FROM `data` AS t2
      WHERE UNIX_TIMESTAMP(t2.date) - UNIX_TIMESTAMP(t1.date) <=300 
      AND t2.date >= t1.date
      AND t1.mem_total = t2.mem_total
      AND t1.host_id = t2.host_id
   ) AS mem_5_min_avg
   FROM `data` AS t1
) AS t1

UPDATE 1: I improved the query to offer twice as fast speed, but it is still very slow for my big table.

SELECT  `date` ,  `mem_used` ,  `mem_total` , `mem_5_min_avg` , 
(`mem_5_min_avg` / `mem_total`) AS mem_usage_rate
FROM (
   SELECT *, (
      SELECT CEILING( AVG( mem_used ) )
      FROM `data` AS t2
      WHERE TIMESTAMPDIFF(
      MINUTE , t1.date, t2.date ) <=5
      AND t2.date >= t1.date
      AND t1.mem_total = t2.mem_total
   ) AS mem_5_min_avg
   FROM `data` AS t1
) AS t1

INITIAL POST

I asked the same question in ubuntuforums and TeoBigusGeekus gave this answer that it works exactly as it has to be working, but it is very slow for the big table with more 100000 rows that I have. It takes 7.5 seconds to execute if I limit the query to 30 rows and more than 20 seconds if I limit it to 100. I guess it would take forever for 100000 rows. Anyway for anyone interested for a solution here it is:

SELECT  `date` ,  `mem_used` ,  `mem_total` , (
   SELECT CEILING( AVG( mem_used ) )
   FROM mytable AS t2
   WHERE TIMESTAMPDIFF(
   MINUTE , t1.date, t2.date ) <=5
   AND t2.date >= t1.date
   AND t1.mem_total = t2.mem_total
) AS mem_5_min_avg, (
   SELECT CEILING( AVG( mem_used ) ) / mem_total
   FROM mytable AS t3
   WHERE TIMESTAMPDIFF(
   MINUTE , t1.date, t3.date ) <=5
   AND t3.date >= t1.date
   AND t1.mem_total = t3.mem_total
) AS mem_rate_usage
FROM mytable AS t1
Vangelis Tasoulas
  • 3,109
  • 3
  • 23
  • 36
0
SELECT
    rrd1.id,
    rrd1.date,
    rrd1.mem_used,
    rrd1.mem_total,
    (
        SELECT
            CEILING(AVG(rrd2.mem_used))
        FROM
            rrd rrd2
        WHERE
            rrd2.date >= rrd1.date AND
            rrd2.date <= AddTime(rrd1.date, '00:05')
    ) AS mem_5_min_avg
FROM
    rrd rrd1
mojiro
  • 1