Telcos collect phone data based on impulses, be it cell phone towers or a change of the net for example. Thereby, unwanted duplicates are built and the task is to get them out, if the only point of difference might be a slight time difference or the presence of a cell tower. An example is given here:
This is an example of the cellphone data collected, date, Phone 1, and Phone2 are he same. What is different is sometimes the time and sometimes the presence of 'from_Tower' data.
SELECT '4/04/2020' as [Date], '7:03:46' as Time, '6123456789' as Phone1, '6987654321'as Phone2, '1' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:03:46' as Time, '6123456789' as Phone1, '6987654321'as Phone2, '' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:03:48' as Time, '6123456789' as Phone1, '6987654321'as Phone2, '5' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:03:48' as Time, '6123456789' as Phone1, '6987654321'as Phone2, ''as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:03:48' as Time, '6123456789' as Phone1, '6987654321'as Phone2, '.'as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:34:33' as Time, '6123456789' as Phone1, '6987654321'as Phone2,'' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:34:34' as Time, '6123456789' as Phone1, '6987654321'as Phone2,'' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:37:34' as Time, '6123456789' as Phone1, '6987654321'as Phone2,'' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:37:34' as Time, '6123456789' as Phone1, '6987654321'as Phone2 ,'-' as From_Tower
How can I de-dup the data, that only 4 lines remain like:
SELECT '4/04/2020' as [Date], '7:03:46' as Time, '6123456789' as Phone1, '6987654321'as Phone2, '1' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:03:48' as Time, '6123456789' as Phone1, '6987654321'as Phone2, '5' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:34:34' as Time, '6123456789' as Phone1, '6987654321'as Phone2,'' as From_Tower UNION
SELECT '4/04/2020' as [Date], '7:37:34' as Time, '6123456789' as Phone1, '6987654321'as Phone2 ,'-' as From_Tower
I could use a max(time)... where from_Tower is not null;
How do I use the max function when there are hundreds of Phone1 numbers with duplicates.
Would it be useful to use arrays for the time, and how could that be done?
Is there any other, better way to do this in a clean way?
There are inspirational thoughts in the section below, which made me think, the best way to solve this might be a LAG function for the time, and whenever time is more than 2 seconds, the data is skipped. So I created this code:
[CTE_dedup] as(
SELECT [Date],[Time], [Phone1], [Phone2], [From_Tower],
LAG([To_A])
OVER (PARTITION BY [Time] ORDER BY [Date]) AS Previous_count,
[Time] - LAG([Time])
OVER (PARTITION BY [Time]ORDER BY [Date] ) AS Timedifference
FROM [CTE_set])
select * from [CTE_dedup]
WHERE Timedifference >2
But I get this error:
Operand data type time is invalid for subtract operator.
time is imported as:
time (time(7), null
why can I not subtract the time fields from each other?