1

I have a few tables that have millions of records where a sensor was sending multiple 0 and 1 values and this data was logged to the table even though we only needed it to keep the very first 1 or 0 per each 1 to 0 or 0 to 1 change.

Adjustments have been made so we only now get the 1 and 0 values on each change and not every one second or whatever but I need to cleanup the unnecessary records from the tables.

enter image description here

I've done some research and testing and I'm having trouble figuring out what method to use here to delete the records not needed. I was trying to figure out how to retain the previous value record using variables and also created row numbers but it's not working as I need it to.

I created an SQLFiddle here and tried some logic per the example post MySQL - How To Select Rows Depending on Value in Previous Row (Remove Duplicates in Each Sequence). I keep getting back no results from this and when I tried running it on a large local MySQL table, and I got an error wto I have to increase the MySQL Workbench read query timeout to 600 or it lost connection.

I also found the "MySql - How get value in previous row and value in next row?" post and tried some variations of it and also "How to get next/previous record in MySQL?" and I've come up with total failure getting the expected results.

The Data

The data in the tables has a TimeStr column and a Value column just as in the screen shot and on the SQLFiddle link I posted with a small sample of the data.

Each record will never have the same TimeStr value but I really only need to keep the very first record time wise when the sensor either turned ON or OFF if that clarifies.

I'm not sure if the records will need an incremental row number added to get the expected results since it only has the TimeStr and the Value records otherwise.

My Question

Can anyone help me determine a method that I can use on a couple large tables to delete the records from a table where there are subsequent and duplicate Value values so the tables only has the very first 1 or 0 records where those actually change from a 1 to 0 or 0 to 1?

I will accept an answer that also results in just the records needed—but any that perform fast would be even more greatly appreciated.

  • I can easily put those into a temp table, drop the original table, and then create and insert the needed records only into the original table.

Expected Results

|              TimeStr | Value |
|----------------------|-------|
| 2018-02-13T00:00:00Z |     0 |
| 2018-02-13T00:00:17Z |     1 |
| 2018-02-13T00:00:24Z |     0 |
| 2018-02-13T00:00:28Z |     1 |
Bitcoin Murderous Maniac
  • 1,209
  • 1
  • 14
  • 27

3 Answers3

1

Try this :

    SET @rownum = 0; 
SET @rownum_x = 0; 
SELECT b.rownum, b.TimeStr, b.Value
FROM 
(
    SELECT @rownum := @rownum+1 as rownum, TimeStr, Value
    FROM sensor
    ORDER BY TimeStr
) b
LEFT JOIN (
    SELECT @rownum_x := @rownum_x+1 as rownum_x, TimeStr as TimeStr_x, Value as Value_x
    FROM sensor
    ORDER BY TimeStr
) x ON b.rownum = x.rownum_x + 1
where b.Value <> x.Value_x or x.Value_x is null
order by b.TimeStr

The result I got is enter image description here

4givN
  • 2,936
  • 2
  • 22
  • 51
  • I use the sqlfiddle too without any result (tried with some simple queries). Please add the schema in your local DB, the execute the query to see. 'cause in my case, it works – 4givN Feb 15 '18 at 08:48
1

You want the first record for each value when it appears. This suggests variables. Here is one way that only involves sorting and no joining:

select t.*
from (select t.*,
             (case when value = @prev_value then value
                   when (@save_prev := @prev_value) = NULL then NULL 
                   when (@prev_value := value) = NULL then NULL
                   else @save_prev
              end) as prev_value
      from (select t.*
            from sensor t
            order by timestr
           ) t cross join
           (select @prev_value := -1) params
     ) t
where prev_value <> value;

Notes:

  • The subquery for ordering only seems to be needed since MySQL 5.7.
  • The case is just a way to introduce serialized code. When using a variable it should only be used on one expression.
  • This only requires one sort -- and if you have an index, that doesn't even need to be a sort.

Here is a SQL Fiddle.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1
Select t.timestr, t.value from (
 SELECT s.*,  @pv x1, (@pv := s.value) x2
 FROM sensor S, (select @pv := -1) x
 ORDER BY TimeStr ) t
where t.x1 != t.x2 

See http://sqlfiddle.com/#!9/8d0774/122

Gerard H. Pille
  • 2,528
  • 1
  • 13
  • 17