1

I have a unique situation not covered by an other article I can find here. I have a workbook of tens of thousands of lines, but they're all essentially like this:

enter image description here

There's a whole lot of "stuff" going on in the workbook and data is constantly added, but the crux of my issue is that I need a piece of code to be able to keep a certain number of the most recent instances of the data (let's say 2) and remove the rest. I don't deal with dates in VBA very often so I wish I could "show my work" thus far, but I truly don't know where to start.

In plain English: Count the number of unique dates in column D. If that number is > 2, THEN delete rows where the date is older than the 2 most recent dates.

Again, I apologize for not having any work to show thus far. I truly have "writers' block" on this one. Any help is appreciated!

UPDATE: With the help in the comments I've written the following to do the first step of finding the 2nd most recent date on my real data sheet (35000+ rows) where the date column is P. I must be doing something wrong because as I track the value of OldMax in the locals window it only returns the most recent date no matter what I put in for the number in Large(DateRange,whatever number). Hmmmmm....

     Sub Remove_Old_Data()

    Dim wks As Worksheet
    Dim OldMax As String
    Dim DateRange As Range
    Dim lrow As Long

    Set wks = ThisWorkbook.Worksheets("X-AotA")
    lrow = wks.Cells(Rows.Count, "P").End(xlUp).Row
    Set DateRange = wks.Range("P2:P" & lrow)

    OldMax = Application.WorksheetFunction.Large(DateRange, 2)

    End Sub
rushjc
  • 25
  • 1
  • 7
  • Can you not just sort the data in ascending order and then simple delete what is not required? – Siddharth Rout Oct 30 '14 at 22:57
  • Definitely, but this particular procedure is in the middle of a big macro that's importing, analyzing, and shuffling things around. So I'd have to end the macro for human intervention and manual deletion -- I think that's what you were proposing. – rushjc Oct 30 '14 at 23:01
  • In that case record a macro to sort the data on Col D and find the bottom two dates and then delete all rows above that :) – Siddharth Rout Oct 30 '14 at 23:03
  • @SiddharthRout, I know, but at first I thought OP's data was daily, so a simple `-1` would do the trick. But looking at his sample data I guess that's not the case. My bad. Instead, he should just use `Large` to get the second highest date. Unless I'm misunderstanding the question. – BobbitWormJoe Oct 30 '14 at 23:14
  • @SiddharthRout - I see where your going and agree. When you're dealing with varying dates though, how do you write the condition? Would it be along the lines of associating each unique date to a "count" then deleting those that are greater than 2? Sounds plausible but I feel like I'd be making it harder than necessary if I did that. I'm sooo bad with VBA dates – rushjc Oct 30 '14 at 23:18
  • Dates are just numbers. That's how excel stores them in the back ground :) Yes, you got the logic right. Once you sort the data in ascending order, go from bottom to up and the 3rd date that you see is the row from where you have to delete till the very top (till row 2) – Siddharth Rout Oct 30 '14 at 23:22
  • Give it a try and if you are stuck, post the code that you tried and we will take it from there :) – Siddharth Rout Oct 30 '14 at 23:22
  • @BobbitWormJoe - Using 'large', would I do something like 'variablename = .Large(D:D,2)' to find the 2nd largest/most recent date in the column then say if the value in D2 (or whatever) is < variablename then delete the row? Am I on the right track for what you're proposing? – rushjc Oct 30 '14 at 23:26
  • @rushjc Yes that's exactly what I had in mind! Actually testing it out myself right now to make sure I'm not speaking nonsense. – BobbitWormJoe Oct 30 '14 at 23:28
  • Yes you can use that as well and then filter out rows which are not equal to those two dates and then simply delete those rows :) – Siddharth Rout Oct 30 '14 at 23:30
  • Excellent. Going to see what I can do combining both of your great ideas then hopefully be able to legitimately "show my work" here :) – rushjc Oct 30 '14 at 23:32
  • [Here](http://stackoverflow.com/questions/11631363/how-to-copy-a-line-in-excel-using-a-specific-word-and-pasting-to-another-excel-s) is an example on how to use Autofilter :) – Siddharth Rout Oct 30 '14 at 23:32
  • @BobbitWormJoe I added some code to the question area. Any idea why I'm only getting the largest/latest date from my `Large` statement? – rushjc Oct 30 '14 at 23:56
  • Ah, discovered that `Large` won't work in this case since there are going to be duplicate dates -- a lot of them. – rushjc Oct 31 '14 at 00:22

2 Answers2

0

I tested the code below and it works. Should be fairly easy to understand, but I basically just loop through all the rows to determine what the 2 most recent dates are and then loop through all the rows again, deleting any rows that do not contain either of those dates.

Sub Remove_Old_Data()

    On Error GoTo 0

    Dim vSheet As Worksheet
    Dim vRange As Range
    Dim vRow As Long
    Dim vRowFirst As Long
    Dim vRowLast As Long
    Dim vCol As Long
    Dim vCurDate As Date
    Dim vTopDate1 As Date
    Dim vTopDate2 As Date

    Set vSheet = ThisWorkbook.Worksheets("X-AotA")
    Set vRange = vSheet.UsedRange

    'Set vCol to column P
    vCol = 17 - vRange.Column

    'Set the rows to scan through
    vRowFirst = 2
    vRowLast = vRange.Rows.Count
    If vRowLast < 4 Then Exit Sub

    'Determine what the biggest 2 dates are
    vTopDate1 = DateValue("1900-01-01")
    vTopDate2 = DateValue("1900-01-01")
    For vRow = vRowFirst To vRowLast
        vCurDate = DateValue("1900-01-01")
        On Error Resume Next
        vCurDate = DateValue(vRange(vRow, vCol).Value)
        On Error GoTo 0

        If vCurDate > vTopDate1 Then
            vTopDate2 = vTopDate1
            vTopDate1 = vCurDate
        ElseIf vCurDate > vTopDate2 And vCurDate <> vTopDate1 Then
            vTopDate2 = vCurDate
        End If
    Next

    'Loop through the rows again and remove any that do not contain the top 2 dates
    vRow = vRowFirst
    Do While vRow <= vRowLast
        vCurDate = DateValue("1900-01-01")
        On Error Resume Next
        vCurDate = DateValue(vRange(vRow, vCol).Value)
        On Error GoTo 0

        If vCurDate <> vTopDate1 And vCurDate <> vTopDate2 Then
            'Remove this row
            vRange.Cells(vRow, 1).EntireRow.Delete
            vRowLast = vRowLast - 1
        Else
            'Continue to the next row
            vRow = vRow + 1
        End If
    Loop

End Sub
neelsg
  • 4,802
  • 5
  • 34
  • 58
0

I ended up using the following because I only used "keep 2 most recent dates" as a simplified example. I'm actually keeping the 12 most recent dates so the other proposed answer would get pretty cumbersome. Here is what I came up with.

Sub Scrub_Old_Data()

Dim iUnique As Long
Dim Wks As Worksheet
Dim LastRow As Long
Dim i As Long
Dim OldDateKeep As Long

OldDateKeep = ThisWorkbook.Worksheets("X-User Input").Range("B11").Value

Set Wks = ThisWorkbook.Worksheets("X-AotA")

'find the last row of data
LastRow = Cells(Cells.Rows.Count, 1).End(xlUp).Row

'make sure the right worksheet is being analyzed
Wks.Select

'check the entire sheet to see if we even have more than 12 unique dates.  If not, do nothing
iUnique = Evaluate("=SUMPRODUCT(1/countif(P2:P" & LastRow & ",P2:P" & LastRow & "))")

If iUnique > OldDateKeep Then

    With Wks
         'sort in descending date order
        .AutoFilter.Sort.SortFields.Clear

        .AutoFilter.Sort.SortFields. _
            Add Key:=Range("P1:P" & LastRow), SortOn:=xlSortOnValues, Order:=xlDescending, _
        DataOption:=xlSortNormal

        With .AutoFilter.Sort
            .Header = xlYes
            .MatchCase = False
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With

    End With

    i = 2
    Do Until IsEmpty(Cells(i, 16))

        If Evaluate("=SUMPRODUCT(1/countif(P1:P" & i & ",P1:P" & i & "))") - 1 > OldDateKeep Then

            Cells(i, 16).EntireRow.Delete

        Else

            i = i + 1

        End If

    Loop


End If


End Sub
rushjc
  • 25
  • 1
  • 7