1

I have a bunch of GPS tracks recorded in a table. Due to a bug in the GPS logging code, the fractional part of each timestamp is incorrect (and repeated). I would like to create a query that finds the first and last record with the same timestamp, and determines the number of records between each rollover in whole seconds. This can be used to determine the delta between each record. I'd like to use this to interpolate the missing/incorrect fractional second part of the timestamp.

Existing Table

| id | timestamp                  |
|----|----------------------------|
| 1  | 2020-09-06 15:08:21.128344 |
| 2  | 2020-09-06 15:08:21.128344 |
| 3  | 2020-09-06 15:08:21.128344 |
| 4  | 2020-09-06 15:08:22.128344 |

Desired Output

| id | timestamp                  |
|----|----------------------------|
| 1  | 2020-09-06 15:08:21.0      |
| 2  | 2020-09-06 15:08:21.25     |
| 3  | 2020-09-06 15:08:21.75     |
| 4  | 2020-09-06 15:08:22.0      |

After reading Select first row in each GROUP BY group, I figured out how to get the first row in each group with identical timestamps.

WITH A AS (
    SELECT ROW_NUMBER() OVER (PARTITION BY time ORDER BY id) as rn, *
    FROM gps_points
) SELECT * FROM A WHERE rn=1 ORDER BY id;

I am hoping to then generate a timestamp by adding (rn -1) * datetime((round(julianday(first_row.timestamp) * 86400) / 86400)) where first_row is the first row of identical timestamps, and delta is 1/(3 repeated timestamps + 1).

This can be explained by this table.

| id | rn       | timestamp                              |
|----|----------|----------------------------------------|
| 1  | 0        | 2020-09-06 15:08:21.0 + 0 * delta      |
| 2  | 1        | 2020-09-06 15:08:21.0 + 1 * delta      |
| 3  | 2        | 2020-09-06 15:08:21.0 + 2 * delta      |
| 4  | 0        | 2020-09-06 15:08:22.0 + 0 * delta      |

I am having trouble integrating the above query into a larger query that does what I want. Perhaps I should just write some python code to do this, but I'd like to improve my SQL knowledge and do this in one fell swoop.

forpas
  • 160,666
  • 10
  • 38
  • 76
Isaac
  • 105
  • 1
  • 6
  • So you always want to add 0.25 seconds per duplicate? What if there were 10 records wirth the same timestamp, what would the result look like? – GMB Sep 24 '20 at 22:13
  • In that case, it would be 1/11 seconds, in case I'm having an off-by-one mistake in my logic. In my actual data, almost every cluster is 25 records long. – Isaac Sep 24 '20 at 22:16
  • Why is the timestamp for id = 3 in the expected output ....22.75? And for id = 4 it is 22.0? – forpas Sep 24 '20 at 22:22
  • My mistake. Fixed that. – Isaac Sep 24 '20 at 22:27
  • Now why 15:08:22.128344 becomes 2020-09-06 15:08:23.0? There is only 1 15:08:22.128344. Why does it increase to 2020-09-06 15:08:23.0? – forpas Sep 24 '20 at 22:28
  • I think the examples are finally fixed. Sorry for the confusion. – Isaac Sep 24 '20 at 22:33

2 Answers2

1

With ROW_NUMBER() and COUNT() window functions:

WITH cte AS (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY timestamp ORDER BY ID) rn,
    COUNT(*) OVER (PARTITION BY timestamp) counter
  FROM tablename  
)
SELECT id,
       DATETIME(timestamp) || '.' || 
       SUBSTR('00' || ((rn - 1) * (100 / (counter + 1))), -2) timestamp
FROM cte

See the demo.
Results:

> id | timestamp             
> -: | :---------------------
>  1 | 2020-09-06 15:08:21.00
>  2 | 2020-09-06 15:08:21.25
>  3 | 2020-09-06 15:08:21.50
>  4 | 2020-09-06 15:08:22.00

I used counter + 1 for the division, to get your expected results, but I believe that the proper way to do it is without that +1:

WITH cte AS (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY timestamp ORDER BY ID) rn,
    COUNT(*) OVER (PARTITION BY timestamp) counter
  FROM tablename  
)
SELECT id,
       DATETIME(timestamp) || '.' || 
       SUBSTR('00' || ((rn - 1) * (100 / counter)), -2) timestamp
FROM cte 

See the demo.
Results:

> id | timestamp             
> -: | :---------------------
>  1 | 2020-09-06 15:08:21.00
>  2 | 2020-09-06 15:08:21.33
>  3 | 2020-09-06 15:08:21.66
>  4 | 2020-09-06 15:08:22.00
forpas
  • 160,666
  • 10
  • 38
  • 76
0

Here is one option:

select t.*,
    substr(timestamp, 1, 20) || (
        1000 * (row_number() over(partition by timestamp order by id) - 1) 
        / (count(*) over(partition by timestamp))
    ) new_timestamp
from mytable t

The idea is to equally distribute records that have the same timestamp within the same second. If there is only one record, its decimal seconds are truncated.

This gives you 3 decimal seconds. You can change the multiplier from 1000 to something else if you like.

I am speculating that you want to extend the logic to records that belong to the same second, rather than only to perfect timestamp duplicates. That's just a slight change in the partition by clauses:

select t.*,
    substr(timestamp, 1, 20) || (
        1000 * (row_number() over(partition by datetime(timestamp) order by id) - 1) 
        / (count(*) over(partition by datetime(timestamp)))
    ) new_timestamp
from mytable t

Demo on DB Fiddle:

id | timestamp                  | new_timestamp          
-: | :------------------------- | :----------------------
 1 | 2020-09-06 15:08:21.128344 | 2020-09-06 15:08:21.0  
 2 | 2020-09-06 15:08:21.128344 | 2020-09-06 15:08:21.333
 3 | 2020-09-06 15:08:21.128344 | 2020-09-06 15:08:21.666
 4 | 2020-09-06 15:08:22.128344 | 2020-09-06 15:08:22.0  
GMB
  • 216,147
  • 25
  • 84
  • 135
  • This almost works, but is missing zero padding in the decimal part. I can't figure out how to paste a large code block in the comment box. – Isaac Sep 24 '20 at 22:39