3

How can I delete duplicate data based on the common value (Start and End)

(Time is unique key)

My table is:

Time Data
10:24:11 Start
10:24:12 Result
10:24:13 Result
10:24:14 End
10:24:15 Start
10:24:16 Result
10:24:17 End

I want to get Data: Result in between Start and End that is with the MAX(TIME) when duplication does occur. as such

The result that I want:

Time Data
10:24:11 Start
10:24:13 Result
10:24:14 End
10:24:15 Start
10:24:16 Result
10:24:17 End

I have tried rearranging the data, but couldn't seems to get the result that I want, Could someone give their advice on this case?


Update

I ended up not using either of the the approach suggested by @fredt and @airliquide as my version of HSQLDB doesn't support the function.

so what I did was, adding sequence and making Start = 1, Result = 2, and End = 3.

Sequence Time Data Indicator
1 10:24:11 Start 1
2 10:24:12 Result 2
3 10:24:13 Result 2
4 10:24:14 End 3
5 10:24:15 Start 1
6 10:24:16 Result 2
7 10:24:17 End 3

Thereon, I make use of the indicator and sequence to get only latest Result. Such that if previous row is 2 (which is result), remove it.

The guide that I follow: From: Is there a way to access the "previous row" value in a SELECT statement?

select t1.value - t2.value from table t1, table t2 
where t1.primaryKey = t2.primaryKey - 1
Minorsee
  • 67
  • 10

2 Answers2

0

Hi a first approach will be to use a lead function as folow

 select hour,status from (select *,lead(status,1) over ( order by hour) as lead
from newtable)compare 
where compare.lead <> status
OR lead is null

Give me what's expected using a postgres engine.

airliquide
  • 520
  • 7
  • 16
0

You can do this sort of thing with SQL procedures.

-- create the table with only two columns
CREATE TABLE actions (attime TIME UNIQUE, data VARCHAR(10));

-- drop the procedure if it exists
DROP PROCEDURE del_duplicates IF EXISTS;

create procedure del_duplicates()  MODIFIES SQL DATA begin atomic
DECLARE last_time time(0) default null;
 for_loop:
 -- loop over the rows in order
 FOR SELECT * FROM actions ORDER BY attime DO

  -- each time 'Start' is found, clear the last_time variable
  IF data = 'Start' THEN
   SET last_time = NULL; 
   ITERATE for_loop;
  END IF;

  -- each time 'Result' is found, delete the row with previous time
  -- if last_time is null, no row is actually deleted 
  IF data = 'Result' THEN
   DELETE FROM actions WHERE attime = last_time;
   -- then store the latest time
   SET last_time = attime; 
   ITERATE for_loop;
  END IF;
        
END FOR;        
END

Your data must all belong to a single day, otherwise there will be strange overlaps that cannot be distinguished. It is better to use TIMESTAMP instead of TIME.

fredt
  • 24,044
  • 3
  • 40
  • 61