For a recordset containing two pertinent columns, [DateService] (date) and [ActivityType] (nvarchar), I am trying to find a more efficient, ideally pure-SQL Server-based solution to finding and deleting records that are "duplicate" records of the same ActivityType within a certain date range. For example, the code should find the first ActivityType = 'X' then delete all records of the same ActivityType where DateService is < 90 days from the previous record, then find the next record of the same ActivityType after the first and delete all similar "duplicates" where DateService is < 90 days from that record and so on until the last record of ActivityType = 'X' is reached.
I've written procedural VBA code in the Access frontend for the database like so (I've omitted a lot of the surrounding code since this is not really a VBA question; dateRun is of type Date and pruned is of type Int, feeApp is the DAO.Recordset):
With feeApp
.MoveFirst
.FindFirst "[ActivityType] = 'IME-5'"
If Not .NoMatch Then
dateRun = ![DateService]
.FindNext "[ActivityType] = 'IME-5'"
If Not .NoMatch Then
Do While Not .NoMatch
If ![DateService] <= dateRun + 30 Then
dateRun = ![DateService]
.Delete
pruned = pruned + 1
Else
dateRun = ![DateService]
End If
.MovePrevious
.FindNext "[ActivityType] = 'IME-5'"
Loop
End If
End If
This code works just fine, it's just fairly slow since it crawls through the recordset one at a time, and in my dbase this code block runs half a dozen more times for different [ActivityType]'s.
Could anyone provide a suggestion of how to approach this in SQL Server? Off the top of my head I suppose I could convert part of these code blocks into separate simple DELETE commands such as,
DELETE * FROM tblFeeApp WHERE [ActivityType] = 'X' AND [DateService] >= #" & dateRun "# AND [DateService] <= #" & dateRun + 90 & "#"
but I would still have to run the Find operations in VBA until I get a NoMatch for each [ActivityType] so I wouldn't think it would be more efficient. I am wondering if there isn't a purely SQL solution, or perhaps a single SQL command for each [ActivityType] which I imagine would be still be orders of magnitude more efficient.
If anyone has any suggestions it would be greatly appreciated and thank you in advance!
EDIT
Thanks to @Ben_Osborne I've almost got a working solution I think. What I have so far in the form of a stored procedure is:
ALTER PROCEDURE [dbo].[procPruneFeeApp](@WCB nvarchar(255),@feeItem nvarchar(255), @daysApart Integer = 90)
AS
DELETE f1
from
dbo.tblFeeApp f1
join dbo.tblFeeApp f2 on
f1.ActivityType = f2.ActivityType
and DATEDIFF(d, f1.DateService, f2.DateService) between 0 and ABS(@daysApart - 1)
and f1.Id < f2.Id
where
f1.[WCB] = @WCB
and f2.[WCB] = @WCB
and f1.[ActivityType] = @feeItem
and f2.[ActivityType] = @feeItem
;
I am calling the proc with this VBA function:
Function pruneFeeApp(WCB As String, feeItem As String, Optional daysApart As Integer = 90) As Integer
If Not isNada(WCB) Then
If Not isNada(feeItem) Then
WCB = Replace("'" & WCB & "'", "''", "'")
feeItem = Replace("'" & feeItem & "'", "''", "'")
Dim qdef As DAO.QueryDef
Set qdef = CurrentDb.CreateQueryDef("")
qdef.ReturnsRecords = False
qdef.Connect = "ODBC;" & getSQLstring
qdef.SQL = "EXEC dbo.procPruneFeeApp @WCB = " & WCB & ", @feeItem = " & feeItem & ", @daysApart = " & daysApart
qdef.Execute dbFailOnError
pruneFeeApp = qdef.RecordsAffected
End If
End If
pruneFeeApp = isNadaZ(pruneFeeApp, 0)
End Function
(Probably this is obvious but isNada and isNadaZ are custom functions that test for null, blank, 0, and other custom "invalid" values; getSQLstring fetches a hard coded SQL connection string; I add the Replace functions for the input variables in the function because for some reason my forms will kick out strings with '' around them already.)
The procedure seems to execute just fine but it's not affecting any records so far. I'm hoping maybe it's a problem with the VBA function and not the stored procedure.
EDIT
A sample of actual data from the table (with unnecessary columns omitted):
Id WCB DateService ActivityType
1961 G0793728 6/23/2014 IME-5
1962 G0793728 6/26/2015 IME-5
1963 G0793728 8/6/2015 IME-5
1964 G0793728 6/4/2014 C-240
1965 G0793728 7/1/2014 C-240
1966 G0793728 2/25/2014 RFA-1LC
1967 G0793728 3/28/2014 RFA-1LC
1968 G0793728 3/31/2014 RFA-1LC
EDIT
@Ben_Osborne's answer below works! Just needed some tweaks to the VBA procedure call using ADODB and it works like a charm. Very much appreciated for all the help!