1

I am working with two large datasets (900K lines each; 8 columns). Power Query isn't handling the merge (system limitation) so I was thinking about incorporating VBA to merge and filter a large portion of data so it's manageable.

Can you please assist?

Criteria:

  • Column P in both - will be used to merge (common between both sets)
  • If Column E = C, true then remove all TRUES from both sets
  • If ABS(C2-D2 X E2)<0.03, true then remove all TRUES from both sets

Show only the remainder data.

I just need a framework and I can make edits accordingly. Many thanks!

Table 1:

P B C D E
ABC 14 51 21 51
ZYX 00 23 78 89
XYZ 55 56 91 55

Table 2:

P F G H
ZYX 11 21 31
XYZ 12 23 32
ABC 13 24 33
Ari Monger
  • 71
  • 5

1 Answers1

0

I make a demo program and test with two table of each 900000 rows. I use a variant array approach. The program takes 18s to run. Quicksort is a must to make the program fast. The Quicksort and FindinArray function both use some kind of log(n) algorithm to be fast.

Dim vArr1 As Variant, vArr2 As Variant
Dim vArr1a As Variant, vArr2a As Variant
Dim last1 As Long, last2 As Long
Dim i As Long, j As Long
Dim t As Double

t = Timer
last1 = 900001 'Or some method to find the last row
last2 = 900001

vArr1 = Range(Worksheets("Sheet1").Cells(2, 1), Worksheets("Sheet1").Cells(last1, 5))
vArr2 = Range(Worksheets("Sheet2").Cells(2, 1), Worksheets("Sheet2").Cells(last2, 5)) 'One more column to store index
vArr1a = Range(Worksheets("Sheet1").Cells(2, 6), Worksheets("Sheet1").Cells(last1, 6))
vArr2a = Range(Worksheets("Sheet2").Cells(2, 6), Worksheets("Sheet2").Cells(last2, 5))

For i = 1 To last1 - 1
    vArr1a(i, 1) = "" 'clear
Next i
For i = 1 To last2 - 1
    vArr2(i, 5) = i 'index
    vArr2a(i, 1) = "" 'clear
Next i
Call Quicksort2(vArr2, 1, 1, last2 - 1)

For i = 1 To last1 - 1
    j = FindinArray2(vArr2, vArr1(i, 1), 1, 0)
        If Abs(vArr1(i, 2) - vArr1(i, 3)) = 0 Then  'Criterion to compare columns in table1
            vArr1a(i, 1) = "abc"
        End If
        
        If j > 0 Then
            If Abs(vArr1(i, 2) - vArr2(j, 2)) = 0 Then 'Criterion to compare columns in table1 with columns in table2
                vArr1a(i, 1) = "false"
                vArr2a(vArr2(j, 5), 1) = "false" 'write back based on index
            End If
        End If
Next i
Range(Worksheets("Sheet1").Cells(2, 6), Worksheets("Sheet1").Cells(last2, 6)) = vArr1a
Range(Worksheets("Sheet2").Cells(2, 5), Worksheets("Sheet2").Cells(last2, 5)) = vArr2a
MsgBox Round(Timer - t, 2)
Abel Wong
  • 191
  • 4
  • this is not working. "Call Quicksort2" is giving a Sub or Function not defined error. – Ari Monger Sep 01 '22 at 09:20
  • You need to get the quicksort function somewhere. e.g. [here](https://stackoverflow.com/questions/152319/vba-array-sort-function), and also the search function. – Abel Wong Oct 10 '22 at 06:21