2

This is my first question on this site and I'm not a programmer, so please bear with me.

I am trying to create an Excel 2013 macro that will compare values in column A on one workbook ("active workbook") to column A's of other Excel files in a particular directory. Duplicate values (rows) would then be deleted from the active workbook.

I have been working piece by piece trying to figure this out as I am not a programmer. So far I have been able to use conditional formatting to highlight unique values when the two columns are side by side (same worksheet). I used =ISNA(MATCH($A2,$B$2:$B$12,0)).

Then I used a macro to print out the duplicate values to another column (instead of highlighting them.. I am still at this stage comparing two columns within the same worksheet). I did this by using the following macro:

Sub Find_Matches()
Dim CompareRange As Variant, x As Variant, y As Variant
' Set CompareRange equal to the range to which you will
' compare the selection.
Set CompareRange = Range("C1:C12")
' NOTE: If the compare range is located on another workbook
' or worksheet, use the following syntax.
' Set CompareRange = Workbooks("Book2"). _
'   Worksheets("Sheet2").Range("C1:C5")
'
' Loop through each cell in the selection and compare it to
' each cell in CompareRange.
For Each x In Selection
    For Each y In CompareRange
        If x = y Then x.Offset(0, 1) = x
    Next y
Next x
End Sub

Then I tried removing duplicate values from two different worksheets but that didn't work:

Sub ProcessFiles()

Dim Filename, Pathname As String
Dim wb1 As Workbook
Dim wb2 As Workbook

Dim Sheet As Worksheet
Dim PasteStart As Range
Dim Counter As Integer

Set wb1 = ActiveWorkbook
Set PasteStart = [RRimport!A1]

Pathname = ActiveWorkbook.Path & "\For Macro to run\"
Filename = Dir(Pathname & "*.xls")
Do While Filename <> ""
    Set wb2 = Workbooks.Open(Pathname & Filename)
    For Each Sheet In wb2.Sheets
            With Sheet.UsedRange
            .Copy PasteStart
            Set PasteStart = PasteStart.Offset(.Rows.Count)
        End With
    Next Sheet
    wb2.Close
    Filename = Dir()
Loop
End Sub

I've been reading this site for couple of days and also searched through YouTube. I haven't had much success after the first two basic things I did.

Background of the project: Everyday we have a list called "Pending List" which essentially is all of the projects we need to get done. Everyday this list grows. Each project is given a unique identifier (numeric value) that is listed in column A of the active workbook. Every day I create my own files of the projects that are being completed. Instead of manually checking off each project one by one by comparing several files every day, I would like Excel to be able to delete the duplicates (meaning the projects that are both on my pending list and in the other files, and only leave the unique projects that still need to be done. Hope I didn't confuse anyone but if I did, please let me know.

eirikdaude
  • 3,106
  • 6
  • 25
  • 50
Melissa
  • 21
  • 2
  • Welcome to SO, Melissa. Can you tell us what do you mean by "that didn't work"? The code isn't working at all or it gives wrong results? Did you try debugging your code? You can find instructions on the Internet how to debug VBA code. – Egan Wolf Aug 30 '17 at 05:50
  • Hi Egan. Sorry I wasn't clearer. I will try this code again and then update the post. I tried to many codes at that point so I want to make sure I reply with the correct answer. Also I am aware of the debug option, but I'm still learning this language so it's been difficult to understand what is going on. Thanks again! – Melissa Aug 30 '17 at 12:27

1 Answers1

0

The question here:

I am trying to create an Excel 2013 macro that will compare values in column A on one workbook ("active workbook") to column A's of other Excel files in a particular directory. Duplicate values (rows) would then be deleted from the active workbook.

So, let's break this down:

  1. There is a directory from which workbooks need to be opened.
  2. When one of the workbooks is open, you want to check column A (I assume this is on the first worksheet for the sake of the example below) for values you have in column A in your active workbook (the one that will run the macro).
  3. If there's a match, delete the row from the active workbook in which the value was stored.
  4. When done, continue with the next workbook in the directory.

Point 1 and 4: Open some files from a specific directory:

We will need some function to open and close the files. This question has been asked many times on SO, for example here

Also, we're going to need the workbooks to be stored in some variable that we will pass to the comparison in the next step.

Public Sub LoopOverFiles()
'Our variables:
Dim wb1 As Workbook 'To hold the active workbook / the macro workbook
Dim wb2 As Workbook 'To hold the workbook we'll be comparing to later on
Dim scanFolder As String 'To set the folder in which the files will be located
Dim fileNameToOpen As String 'To get the filenames that we will open

Set wb1 = ThisWorkbook
scanFolder = "C:\temp\"
fileNameToOpen = Dir(scanFolder & "*.xlsx")

'And loop over the files:
Do While Len(fileNameToOpen) > 0 'To exit the loop when there's no more xlsx files
    Set wb2 = Workbooks.Open(scanFolder & fileNameToOpen)

    'To do the actual comparison of the 2 workbooks, we call our compare routine.
    DoTheComparison wb1, wb2 'Note we'll be passing the two workbooks as parameters to the compare function

    wb2.Close SaveChanges:=False 'We don't want to leave it open after we're done with it.
    fileNameToOpen = Dir 'To continue with the next file.
Loop
End Sub

Point 2 and 3: Do the comparison and delete some rows

As you can see, the actual comparison will be done by a routine that's called DoTheComparison and that takes 2 workbooks as parameters. Based on the first routine, we know that the workbooks that will be passed are the correct ones (wb1 being the active one, wb2 being the variable one that gets opened during the loop). In this example we'll stick to the first worksheet in wb2.

Public Sub DoTheComparison(wb1 as Workbook, wb2 as Workbook)
'Dim compareFrom as Range - Not needed.
Dim compareTo as Range
Dim compareFromCell as Range
Dim compareToCell as Range

Dim i as Integer 
'EDIT: Since we delete, we need a backwards loop. This can't be done with "for each" so we'll use "for" with step -1. 
'That is why we also don't need the "CompareFrom" range variable anymore.

Set compareTo = wb2.Worksheets(1).Range("A2:A20")

For i = 20 to 2 step -1 
    Set compareFromCell = wb1.Worksheets("RemoveValsFromHere").Range("A" & i) 'We get the cells based on the index. 
    For Each compareToCell in compareTo

        If compareFromCell.Value = compareToCell.Value Then 'Point 3:
            compareFromCell.EntireRow.Delete shift:=xlUp
            Exit For 
            'Note that we need to exit the inner loop: 
            'After a match was found, the "compareFromCell" is deleted after all.
            'Therefore we have to continue with the next compareFromCell, otherwise we'll get an error.
        End If

    Next compareToCell
Next i
End Sub

Note that especially DoTheComparison is written for maximum clarity, not for optimal speed (far from it!). I see in your question you've been looking into comparing variants / arrays, which is indeed a lot faster.

EDIT: I altered the code above since you're facing the "skipping cells" issue due to cell deletion. In short: The index changes, so when moving to the next cell after deletion the index is wrong. The fix is an easy backwards for loop. Also see this question and answer

Rik Sportel
  • 2,661
  • 1
  • 14
  • 24
  • Omg thank you so much. I will run this today and let you know what I get. Again thank you!!!! – Melissa Aug 30 '17 at 14:39
  • I tried the macro and it is leaving two values in the active worklist that it should delete (A2 and A4). Do you know what this means? I updated the the Macro here: Set compareFrom = wb1.Worksheets("Sheet1").Range("A1:A20") – Melissa Aug 30 '17 at 23:51
  • I just realized that when I run the macro two more times, the A2 and A4 values get removed and only leaves the unique values in the active workbook. Seems like I'm getting close but I can't figure out why it's not looking through the whole array. I tried Range(A:A) but Excel froze. – Melissa Aug 31 '17 at 00:56
  • @Melissa you're right my bad. The cell indices change when deleting a row. Common issue, see updated code. Should fix it. – Rik Sportel Aug 31 '17 at 12:57
  • WOW this is awesome! It works!!! I will be stress testing this code over the next week to make sure it isn't missing any projects. THANK YOU SO MUCH!!! – Melissa Sep 01 '17 at 01:09
  • If this was indeed what you were after, consider marking your question answered. – Rik Sportel Sep 01 '17 at 07:32