1

For a recordset containing two pertinent columns, [DateService] (date) and [ActivityType] (nvarchar), I am trying to find a more efficient, ideally pure-SQL Server-based solution to finding and deleting records that are "duplicate" records of the same ActivityType within a certain date range. For example, the code should find the first ActivityType = 'X' then delete all records of the same ActivityType where DateService is < 90 days from the previous record, then find the next record of the same ActivityType after the first and delete all similar "duplicates" where DateService is < 90 days from that record and so on until the last record of ActivityType = 'X' is reached.

I've written procedural VBA code in the Access frontend for the database like so (I've omitted a lot of the surrounding code since this is not really a VBA question; dateRun is of type Date and pruned is of type Int, feeApp is the DAO.Recordset):

With feeApp
.MoveFirst
.FindFirst "[ActivityType] = 'IME-5'"
If Not .NoMatch Then
    dateRun = ![DateService]
    .FindNext "[ActivityType] = 'IME-5'"
    If Not .NoMatch Then
        Do While Not .NoMatch
            If ![DateService] <= dateRun + 30 Then
                dateRun = ![DateService]
                .Delete
                pruned = pruned + 1
            Else
                dateRun = ![DateService]
            End If
            .MovePrevious
            .FindNext "[ActivityType] = 'IME-5'"
        Loop
    End If
End If

This code works just fine, it's just fairly slow since it crawls through the recordset one at a time, and in my dbase this code block runs half a dozen more times for different [ActivityType]'s.

Could anyone provide a suggestion of how to approach this in SQL Server? Off the top of my head I suppose I could convert part of these code blocks into separate simple DELETE commands such as,

DELETE * FROM tblFeeApp WHERE [ActivityType] = 'X' AND [DateService] >= #" & dateRun "# AND [DateService] <= #" & dateRun + 90 & "#"

but I would still have to run the Find operations in VBA until I get a NoMatch for each [ActivityType] so I wouldn't think it would be more efficient. I am wondering if there isn't a purely SQL solution, or perhaps a single SQL command for each [ActivityType] which I imagine would be still be orders of magnitude more efficient.

If anyone has any suggestions it would be greatly appreciated and thank you in advance!

EDIT

Thanks to @Ben_Osborne I've almost got a working solution I think. What I have so far in the form of a stored procedure is:

ALTER PROCEDURE [dbo].[procPruneFeeApp](@WCB nvarchar(255),@feeItem nvarchar(255), @daysApart Integer = 90)
AS

DELETE f1
from
    dbo.tblFeeApp f1
    join dbo.tblFeeApp f2 on
        f1.ActivityType = f2.ActivityType
        and DATEDIFF(d, f1.DateService, f2.DateService) between 0 and ABS(@daysApart - 1)
        and f1.Id < f2.Id
where
    f1.[WCB] = @WCB
    and f2.[WCB] =  @WCB
    and f1.[ActivityType] = @feeItem
    and f2.[ActivityType] = @feeItem
;

I am calling the proc with this VBA function:

Function pruneFeeApp(WCB As String, feeItem As String, Optional daysApart As Integer = 90) As Integer

If Not isNada(WCB) Then
    If Not isNada(feeItem) Then
        WCB = Replace("'" & WCB & "'", "''", "'")
        feeItem = Replace("'" & feeItem & "'", "''", "'")

        Dim qdef As DAO.QueryDef
        Set qdef = CurrentDb.CreateQueryDef("")
        qdef.ReturnsRecords = False
        qdef.Connect = "ODBC;" & getSQLstring
        qdef.SQL = "EXEC dbo.procPruneFeeApp @WCB = " & WCB & ", @feeItem = " & feeItem & ", @daysApart = " & daysApart
        qdef.Execute dbFailOnError

        pruneFeeApp = qdef.RecordsAffected
    End If
End If

pruneFeeApp = isNadaZ(pruneFeeApp, 0)

End Function

(Probably this is obvious but isNada and isNadaZ are custom functions that test for null, blank, 0, and other custom "invalid" values; getSQLstring fetches a hard coded SQL connection string; I add the Replace functions for the input variables in the function because for some reason my forms will kick out strings with '' around them already.)

The procedure seems to execute just fine but it's not affecting any records so far. I'm hoping maybe it's a problem with the VBA function and not the stored procedure.

EDIT

A sample of actual data from the table (with unnecessary columns omitted):

Id WCB DateService ActivityType 1961 G0793728 6/23/2014 IME-5 1962 G0793728 6/26/2015 IME-5 1963 G0793728 8/6/2015 IME-5 1964 G0793728 6/4/2014 C-240 1965 G0793728 7/1/2014 C-240 1966 G0793728 2/25/2014 RFA-1LC 1967 G0793728 3/28/2014 RFA-1LC 1968 G0793728 3/31/2014 RFA-1LC

EDIT

@Ben_Osborne's answer below works! Just needed some tweaks to the VBA procedure call using ADODB and it works like a charm. Very much appreciated for all the help!

  • Instead of deleting, why not just return a new recordset where the ActivityType loops through and finds where the next date is >90, and avoid going through the deletions. – geeFlo Nov 16 '16 at 00:41
  • Thanks for the reply @geeFlo, I actually do want the records deleted in this particular case, otherwise what you suggest might be a viable approach. – J. M. French Nov 16 '16 at 22:10

1 Answers1

3

If I'm understanding the requirements correctly, a record should be deleted when there's another record with the same activity type within 90 days.

I could be oversimplying things. But if not, I would think this would work:

    delete f1
    from
        dbo.tblFeeApp f1
        join dbo.tblFeeApp f2 on
            f1.ActivityType = f2.ActivityType
            and DATEDIFF(d, f1.DateofService, f2.DateofService) between 0 and 89 -- using this rather than "< 90" so that negative results are not included
            and f1.Id < f2.Id -- assuming you have an identity field.  Prevents a record from qualifying itself for deletion.  Also prevents multiple records with the same date of service from qualifying each other for deletion
    where
        f1.[WCB] = 'qualifying value' -- the record being deleted
        and f2.[WCB] = 'qualifying value' -- the latter recrd (not being deleted)
    ;

Here's a SQL Script that mocks up some data and then deletes it using this logic. In my tests it always finds records to delete. Your real data will be different, of course, but it's a demonstration / test of the deletion logic:

    set nocount on;

    --mock up some wcb records
    declare @wbc table ([WCB] varchar(8));

    declare @w integer = 0;
    while @w < 10 begin
        insert into @wbc (WCB) values (left(newid(), 8));
        set @w = @w + 1;
    end;

    --select a random WCB record that will be the value that qualifies records for deletion
    declare @qualifyingWcb varchar(8)= (select top 1 wcb from @wbc order by newid());
    select [Qualifying WCB] = @qualifyingWcb;

    -- mock up some activity
    declare @tblFeeApp table (Id integer identity, [ActivityType] varchar(20), [DateofService] date, [WCB] varchar(100));

    declare @x integer = 0;
    while @x <= 10000 begin

        insert into @tblFeeApp (ActivityType, DateofService, WCB) 
        values
        (
            'IME-' + convert(varchar, convert(integer, rand() * 1000)), 
            dateadd(d, -1 * convert(integer, rand() * 200), getdate()),
            (select top 1 wcb from @wbc order by newid())
        );

        set @x = @x + 1;

    end;

    set nocount off;



    --delete f1
    select
        [What] = 'Record That Meet Deletion Criteria -->',
        f1.*,
        [Spacer] = '    ',
        [What] = 'The Record That Qualififes F1 for Deletion -->',
        f2.*
    from
        @tblFeeApp f1
        join @tblFeeApp f2 on
            f1.ActivityType = f2.ActivityType
            and DATEDIFF(d, f1.DateofService, f2.DateofService) between 0 and 89 -- using this rather than "< 90" so that negative results are not included
            and f1.Id < f2.Id -- assuming you have an identity field.  Prevents a record from qualifying itself for deletion.  Also prevents multiple records with the same date of service from qualifying each other for deletion
    where
        f1.[WCB] = @qualifyingWcb -- the record being deleted
        and f2.[WCB] = @qualifyingWcb -- the latter recrd (not being deleted)
    ;
Ben Osborne
  • 1,412
  • 13
  • 20
  • Thanks very much @Ben_Osborne, I will try this out ASAP and see if it works, very much appreciated! Will post back the results. And yes, that's exactly right, don't know why I couldn't have put it more simply like that :). – J. M. French Nov 16 '16 at 22:11
  • How would I limit this to a subset of records? I.e., there's another column, "WCB" (nvarchar) and I would like this to run only with respect to records that have a specified "WCB" value. Would I add a WHERE clause somewhere below dbo.tblFeeApp? Thanks! – J. M. French Nov 16 '16 at 22:41
  • I'm glad you found this helpful. I updated my response to include the WCB thing. – Ben Osborne Nov 16 '16 at 23:45
  • Thanks again @Ben_Osborne for the help; I updated my answer. I'm still not getting any actual deletions but I feel I'm really close, maybe it's a problem with my VBA call. – J. M. French Nov 17 '16 at 16:32
  • Hmmm...I will update my response with a SQL script that you can experiment with in SSMS. – Ben Osborne Nov 17 '16 at 16:55
  • Thanks very much again for the help, somehow I'm still not quite getting it to work. I tried adding your SELECT statement to the procedure (i.e., `SELECT [ActivityType] = @feeItem, f1.*, [ActivityType] = @feeItem, f2.*` after the DELETE f1 and the proc compiles fine but gives an ODBC error when called. Probably some stupid little mistake on my part. – J. M. French Nov 18 '16 at 16:00
  • I posted a sample of actual data from the table in case it's helpful. Thanks again very much for all the time and help. – J. M. French Nov 18 '16 at 16:04
  • You're welcome. This answer might help you execute the proc from VBA: http://stackoverflow.com/questions/24267080/calling-stored-procedure-vba – Ben Osborne Nov 18 '16 at 16:57