-2

I'm trying to make comparison between two columns. column A has about 85,000 list and column B has about 270,000. I tried using vlookup and other functions like countif. however, anytime I try any of these functions my excel workbook could take about 45 minutes to an hour to even complete a basic task. please, any suggestion why my excel workbook keeps freezing and slowing down? thanks in advance.

Community
  • 1
  • 1
Bright
  • 15
  • 5
  • Excel isn't faster than that. You need another solution for that amount of data. – idstam Dec 17 '14 at 14:37
  • 1
    Can you provide an example? – RafaSashi Dec 17 '14 at 14:37
  • 1
    Excel is *much* faster than that. Using Power Query/Power Pivot it's able to use column compression and an OLAP analytic engine that allows it to process several millions of rows. The only (somewhat artificial) limit is that the (compressed) file can't get over 2GB in size, to allow uploading to SharePoint and Office Online document libraries. Without specific formula examples though, it's hard to say what causes the delay – Panagiotis Kanavos Dec 17 '14 at 14:38
  • @pnuts..I am trying to match those two columns. this is one of the formulas I used; IF(COUNTIF(B:B,A1),A1,0)..the formulae is working perfectly just that my workbook keeps slowing down – Bright Dec 17 '14 at 14:45
  • @PanagiotisKanavos.this is one of the formulas I used; IF(COUNTIF(B:B,A1),A1,0) and VlOOKUP(A1,B:B,0,1). any of these formulas work perfectly just that the workbook keeps slowing down. I was thinking may be the RAM could be a factor or – Bright Dec 17 '14 at 14:48
  • @Bright What exactly are you trying to accomplish? It is obvious your current solution is too inefficient for your specific needs. What exactly are you doing to the two columns and you are outputting your result to a third column? – CodeCamper Dec 17 '14 at 15:02
  • @CodeCamper..the goal of my comparison is to find the names of vendors in Column A that can also be found in Column B. but my main problem is why my workbook keeps slowing down. for instance, it could take about 40 minutes to an hour when I'm trying to even save it – Bright Dec 17 '14 at 15:07
  • A macro that reads data into arrays and does the functionality and then spits out the results in one go would be massively faster. I am pretty certain your workbook is taking a long time because of the volatile formulas and the large amount of data you are applying them to – Mark Moore Dec 17 '14 at 15:18
  • Just to add to my above comment, I do this type of Duplicate checking with quite a few worksheets along with many other validations, they are very fast. Need more info for a specific solution, but rough outline. Read Column A data into Array Read Column B Data into an Array Sort Arrays Loop arrays looking for Dupes Depending on what you want to, you can store location of dupes and report them, colour their cells etc – Mark Moore Dec 17 '14 at 15:39
  • @MarkMoore. thanks for your help. I'm trying to write a code as u said. but the VB editor keeps telling me "out of memory" when I press enter after the sub statement. – Bright Dec 17 '14 at 15:44
  • I think your out of memory error is related to your performance issues due to the number of validations. If you remove those it should go away. Based on your criteria listed here, I have added a detailed solution below. It is an example that's quite simplistic, but hopefully it will give you some ideas. – Mark Moore Dec 17 '14 at 22:19

1 Answers1

0

The code below is one method of highlighting duplicate data that can be expanded on to make it more complex if required. I have also included the QuickSortArray procedure I use to sort my arrays for this kind of work, the sort proc is not my work, and the credits are in the comments. Code is pretty well documented, so hope it helps

'---------------------------------------------------------------------------------------
' Procedure : DuplicateCheck
' Author    : Mark Moore
' Date      : 17/12/2014
' Purpose   : This procedure is an simple example of duplicate checking multiple columns of data
'             What is checked can simply be 2 columns of data, or can also be as complex as the developer
'             wants to code it i.e.  You could use this same method to duplicate compound values, simply by
'             populating the array with concatenated data.  Likewise the data that is reported can easily be
'             made richer by adding data you may wish to report on.  eg,  Sheet name could be added for each value
'             if you wanted to compare data across sheets.
'---------------------------------------------------------------------------------------
'
Sub DuplicateCheck()
Dim ColAArray, ColBArray, CombinedArray, DupesArray As Variant
Dim ArrayRange As String
Dim FirstRowInArray, LastRowInArray, loopcounter, DataRowNumber, DupesCounter As Long

   'define the range of list A by going from A1 to last populated cell in col A
   ArrayRange = "A1" & ":A" & Sheets("Sheet1").Range("A" & Rows.Count).End(xlUp).Row
   'Assign values to array
   ColAArray = Worksheets("Sheet1").Range(ArrayRange).Value ' read all the values at once from the Excel grid, put into an array

   'define the range of list B by going from B1 to last populated cell in col B
   ArrayRange = "B1" & ":B" & Sheets("Sheet1").Range("B" & Rows.Count).End(xlUp).Row
   'Assign values to array
   ColBArray = Worksheets("Sheet1").Range(ArrayRange).Value ' read all the values at once from the Excel grid, put into an array

   'We are going  to combine the 2 arrays and add cell addresses, so redefine the combined array to size of both arrays together
   ReDim CombinedArray(LBound(ColAArray, 1) To UBound(ColAArray) + UBound(ColBArray, 1), 1 To 2)
   'In theory every value could be duplicated, so size the duplicate array to combined size also
   ReDim DupesArray(LBound(CombinedArray, 1) To UBound(CombinedArray, 1), 1 To 2)

   'This is the row number of the data, in our case its 1 because our data range started at row 1 - change if you change the data range
   DataRowNumber = 1
   'First populate the ColA array into the combined array
   For loopcounter = LBound(ColAArray, 1) To UBound(ColAArray, 1)
       CombinedArray(loopcounter, 1) = ColAArray(loopcounter, 1) 'Actual value from Col A
       CombinedArray(loopcounter, 2) = "A" & DataRowNumber 'Cell address of that value
       DataRowNumber = DataRowNumber + 1 ' Increment the row number
   Next loopcounter

   'reset the data row back to 1
   DataRowNumber = 1
   'Now add the colb array to the combined array, starting from the element after the end of the first array
   For loopcounter = UBound(ColAArray, 1) + 1 To UBound(CombinedArray, 1)
       CombinedArray(loopcounter, 1) = ColBArray(DataRowNumber, 1) 'Actual value from Col A
       CombinedArray(loopcounter, 2) = "B" & DataRowNumber 'Cell address of that value
       DataRowNumber = DataRowNumber + 1 ' Increment the row number
   Next loopcounter

   Call QuickSortArray(CombinedArray, , , 1) 'Sort the array by cell values, so any dupes will be next to each other

   'Using variables for these values just for readability
   FirstRowInArray = LBound(CombinedArray, 1)
   LastRowInArray = UBound(CombinedArray, 1)

   'Initialise the duplicate counter
   DupesCounter = 0

   'The array is sorted, so loop through the array looking for duplicate values next to each other
   For loopcounter = LBound(CombinedArray, 1) To UBound(CombinedArray, 1)
       Select Case loopcounter
           Case LastRowInArray 'Deal with the last row - We cant check the next row as would be out of bound
               If CombinedArray(loopcounter, 1) = CombinedArray(loopcounter - 1, 1) Then
                   DupesCounter = DupesCounter + 1
                   'Add an entry to the error array
                   DupesArray(DupesCounter, 1) = CombinedArray(loopcounter, 1) 'Duplicated value
                   DupesArray(DupesCounter, 2) = CombinedArray(loopcounter, 2) 'Cell address
                   Exit For 'last element so exit
               End If
           'Deal with first - we cant check the preceeding rows
           Case FirstRowInArray
               If CombinedArray(loopcounter, 1) = CombinedArray(loopcounter + 1, 1) Then
                   DupesCounter = DupesCounter + 1
                   'Add an entry to the error array
                   DupesArray(DupesCounter, 1) = CombinedArray(loopcounter, 1) 'Duplicated value
                   DupesArray(DupesCounter, 2) = CombinedArray(loopcounter, 2) 'Cell address
               End If
           Case Else 'Deal with all other rows - we need to check if the preceeding or the following row is a duplicate
               If CombinedArray(loopcounter, 1) = CombinedArray(loopcounter - 1, 1) Or _
               CombinedArray(loopcounter, 1) = CombinedArray(loopcounter + 1, 1) Then
                   DupesCounter = DupesCounter + 1
                   'Add an entry to the error array
                   DupesArray(DupesCounter, 1) = CombinedArray(loopcounter, 1) 'Duplicated value
                   DupesArray(DupesCounter, 2) = CombinedArray(loopcounter, 2) 'Cell address
               End If
       End Select
   Next loopcounter

   'Now process the duplicates.  You could do anythin here with this data, report to text file etc
   'I have just chosen to colour the cells
   If DupesCounter > 0 Then
      For loopcounter = 1 To DupesCounter
        ActiveSheet.Range(DupesArray(loopcounter, 2)).Interior.Color = RGB(255, 255, 0)
      Next loopcounter
      MsgBox DupesCounter & " duplicates were found and highlighted when comparing values between column A and B", vbInformation, "Results of comparing Column A values with Column B"
   Else
      MsgBox "No Duplicated values between column A and B", vbInformation, "Results of comparing Column A values with Column B"
   End If
End Sub

'---------------------------------------------------------------------------------------
' Procedure : QuickSortArray
' Purpose   : This procedure is used to sort multi dimensional arrays - It is primarily used to sort arrays
'             from sheets where the original sheet sort order wants to be left "as is", so the data is read into
'             an array and then sorted.  This code was taken from the internet, with comments/credits as shown below
'             source url:  http://stackoverflow.com/questions/152319/vba-array-sort-function
'               Sort a 2-Dimensional array
'               SampleUsage: sort arrData by the contents of column 3 ' ' QuickSortArray arrData, , , 3
'               Posted by Jim Rech 10/20/98 Excel.Programming
'               Modifications, Nigel Heffernan:
'                   Escape failed comparison with empty variant
'                   Defensive coding: check inputs
' Version   : 06/01/2014 : Mark Moore - Initial Version
'---------------------------------------------------------------------------------------
Public Sub QuickSortArray(ByRef SortArray As Variant, _
    Optional lngMin As Long = -1, _
    Optional lngMax As Long = -1, _
    Optional lngColumn As Long = 0)

Dim i As Long
Dim j As Long
Dim varMid As Variant
Dim arrRowTemp As Variant
Dim lngColTemp As Long

On Error Resume Next

    If IsEmpty(SortArray) Then
        Exit Sub
    End If


    If InStr(TypeName(SortArray), "()") < 1 Then 'IsArray() is somewhat broken: Look for brackets in the type name
        Exit Sub
    End If


    If lngMin = -1 Then
        lngMin = LBound(SortArray, 1)
    End If

    If lngMax = -1 Then
        lngMax = UBound(SortArray, 1)
    End If


    If lngMin >= lngMax Then ' no sorting required
        Exit Sub
    End If

    i = lngMin
    j = lngMax

    varMid = Empty
    varMid = SortArray((lngMin + lngMax) \ 2, lngColumn)


    ' We send 'Empty' and invalid data items to the end of the list:
    If IsObject(varMid) Then ' note that we don't check isObject(SortArray(n)) - varMid might pick up a valid default member or property
        i = lngMax
        j = lngMin
    ElseIf IsEmpty(varMid) Then
        i = lngMax
        j = lngMin
    ElseIf IsNull(varMid) Then
        i = lngMax
        j = lngMin
    ElseIf varMid = "" Then
        i = lngMax
        j = lngMin
    ElseIf VarType(varMid) = vbError Then
        i = lngMax
        j = lngMin
    ElseIf VarType(varMid) > 17 Then
        i = lngMax
        j = lngMin
    End If

    While i <= j
        While SortArray(i, lngColumn) < varMid And i < lngMax
            i = i + 1
        Wend

        While varMid < SortArray(j, lngColumn) And j > lngMin
            j = j - 1
        Wend

        If i <= j Then
             ' Swap the rows
             ReDim arrRowTemp(LBound(SortArray, 2) To UBound(SortArray, 2))
             For lngColTemp = LBound(SortArray, 2) To UBound(SortArray, 2)
                 arrRowTemp(lngColTemp) = SortArray(i, lngColTemp)
                 SortArray(i, lngColTemp) = SortArray(j, lngColTemp)
                 SortArray(j, lngColTemp) = arrRowTemp(lngColTemp)
             Next lngColTemp
             Erase arrRowTemp
             i = i + 1
             j = j - 1
        End If
    Wend

    If (lngMin < j) Then Call QuickSortArray(SortArray, lngMin, j, lngColumn)
    If (i < lngMax) Then Call QuickSortArray(SortArray, i, lngMax, lngColumn)
End Sub
Mark Moore
  • 520
  • 4
  • 13