0

To start, I have little knowledge of VBA and have not tried to write a code for what I want to do as I don't even know where to start.

I currently have two tables. Table1 contains 48000 rows of data and two columns, a unique identifier and a cash amount for each ID. Table2 contains 50000 rows of data and two columns, a unique identifier and a cash amount for each ID. ID numbers are unique to their own table so there are often repeated IDs in the other Table. The goal of this is to combine the two tables by ID number and show the total cash amount for each ID number.

My first attempt involved using the SUMIF function to get me the totals from both tables. Although this worked for the first ID, when I tried to copy the formula to the other cells, my laptop crashed completely forcing a restart.

My second attempt involved using PivotTable wizard to combine the two ranges. However, I discovered that PivotTables cannot handle this many unique values. (Based on the popup that appeared).

My third attempt worked but I found it long and I am hoping there is a better method. I split my tables into two ranges of about 20,000 rows (so now there are 4 tables). I then used the PivotTable wizard to combine these two at a time. First was Table1 and Table3, then Table2 and Table4. Then I had to split the resulting lists again as PivotTable couldn't handle it and repeated this process. The issue with this method is I feel there is a definite possibility of missed or repeated values because of all the splitting.

During all three of these attempts, my computer repeatedly had problems and required restarts.

I don't care if a VBA solution takes a while to run, as long as it works.

I have tried looking at other examples but some I couldn't figure out how to apply them to my situation and others seemed like they weren't working with big enough files to experience some of the problems I am facing.

Thank you and please let me know if you need clarification on anything.

Kyle
  • 15
  • 1
  • 7

4 Answers4

0

I would suggest connecting to the worksheets via an ADO connection and joining the two tables with an SQL statement.

Add a reference to the Microsoft ActiveX Data Objects library (Tools -> References...) — use the latest version which is usually 6.1.

Insert a module into the VBA project and paste the following code:

Sub JoinTables()

Dim connectionString As String
connectionString = _
    "Provider=Microsoft.ACE.OLEDB.12.0;" & _
    "Data Source=""" & ActiveWorkbook.FullName & """;" & _
    "Extended Properties=""Excel 12.0;HDR=Yes"""

'The SQL statement that shapes the resulting data
Dim sql As String
sql = _
    "SELECT t1.ID, t1.Value + IIF(t2.Value IS NULL, 0, t2.Value) AS FinalSum " & _
    "FROM [Sheet1$] AS t1 " & _
    "LEFT JOIN [Sheet2$] AS t2 ON t1.ID = t2.ID " & _
    "UNION SELECT t2.ID, t2.Value " & _
    "FROM [Sheet2$] AS t2 " & _
    "LEFT JOIN [Sheet1$] AS t1 ON t2.ID = t1.ID " & _
    "WHERE t1.ID IS NULL"

Dim rs As New ADODB.Recordset
'All the fun happens here
rs.Open sql, connectionString

'Paste the resulting records into the third sheet of the active workbook
ActiveWorkbook.Sheets(3).Range("A2").CopyFromRecordset rs

Set rs = Nothing

End Sub

Notes:

  • Currently, the recordset is reading data from the current (Excel) workbook. If the data is coming from a database, it may be simpler and more efficient to modify the connection string to connect to the database directly and issue the SQL statement against the database.
  • The code assumes that the first line of each worksheet holds the column labels, e.g. ID and Value. If this is not the case, specify HDR=No in the third line of the connectionString (instead of HDR=Yes), and the fields will be autoassigned names beginning from F1, F2, etc.
  • The results are pasted into the third sheet of the active workbook. This may or may not be appropriate.
  • You don't specify how you want to order the data, but that is simple enough with the addition of an ORDER BY clause to the SQL statement.

Explanation of the SQL statement

We are comparing two tables. For a given ID, there could be three possibilities:
1. the ID appears in both tables,
2. it appears only in the first table, or
3. it appears only in the second table.

We are also working with the assumption that the ID is unique within each table.

The first half of the statement (up to UNION) handles 1 and 2.

SELECT t1.ID, t1.Value + IIF(t2.Value IS NULL, 0, t2.Value) AS FinalSum 
FROM [Sheet1$] AS t1
LEFT JOIN [Sheet2$] AS t2 ON t1.ID = t2.ID

It can be described as follows:

Start with the records in the first table — FROM [Sheet1$] AS t1

Match up each record in the second table to the corresponding record in the first table, based on ID — LEFT JOIN [Sheet2$] AS t2 ON t1.ID = t2.ID

Include all records from the first table, and only matching records in the second table — the LEFT in LEFT JOIN

Return two columns: the ID from the first table, and the combination of the values from the first and second table — SELECT ...

If there is no matching record in the second table, the value will be NULL (not the same as zero). Trying to add a number to NULL will return NULL, which is not what we want. So we have to write this formula — t1.Value + IIF(t2.Value IS NULL, 0, t2.Value):

  • If the value from the second table is null then add 0

  • otherwise add the value from the second table

The second half of the statement handles IDs that appear only in the second table:

UNION 
SELECT t2.ID, t2.Value
FROM [Sheet2$] AS t2
LEFT JOIN [Sheet1$] AS t1 ON t2.ID = t1.ID
WHERE t1.ID IS NULL

Append a second set of results on top of the first set of results — UNION

Start with the records from the second table — FROM [Sheet2$] AS t2

Match up the records from the first table to the records in the second table (note this is reversed from the first half of the query) — LEFT JOIN [Sheet1$] AS t1 ON t2.ID = t1.ID

We only want records that don't have an ID in the first table — WHERE t1.ID IS NULL

Zev Spitz
  • 13,950
  • 6
  • 64
  • 136
0

In the end, I used PivotTable Wizard to combine the ranges in batches of 10,000.

Thank you for your help.

Kyle
  • 15
  • 1
  • 7
-1

If you want a VBA solution that doesn't use pivot tables, you can try to create a dictionary object and use the ID as a key and the cash value as the value. like this. You need to add a reference to Microsoft Scripting Runtime first.

Sub CreateEmployeeSum()
    Dim wb As Workbook
    Set wb = ThisWorkbook
    Dim table1 As Worksheet, _
        table2 As Worksheet, finalTable As Worksheet
    'wasn't sure if you were using sheets of data
    'or actual tables - if they are actual tables,
    'you can loop through those in a similar way, look up
    'on other stackoverflow problems how


    Set table1 = wb.Sheets("Sheet1") 'first sheet of info
    Set table2 = wb.Sheets("Sheet2") 'second sheet of info
    Set finalTable = wb.Sheets("Sheet3") 'destination sheet


    'get the last row of both tables
    Dim lastRowT1 As Long, lastRowT2 As Long
    lastRowT1 = table1.Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
    lastRowT2 = table2.Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row

    'write the info to arrays so faster to loop through
    Dim t1Array As Variant, t2Array As Variant
    t1Array = table1.Range("A1:B" & lastRowT2).Value
    t2Array = table2.Range("A1:B" & lastRowT2).Value

    'create a dictionary that maps IDs to cash value
    Dim idToCashDict As Dictionary
    Set idToCashDict = New Dictionary

    'first loop through info from first sheet
    Dim i As Long
    For i = 1 To UBound(t1Array)
        Dim idNum As String, cashVal As Double
        idNum = CStr(t1Array(i, 1))
        cashVal = CDbl(t1Array(i, 2))
        If idToCashDict.Exists(idNum) Then
            cashVal = cashVal + idToCashDict.Item(idNum)
            idToCashDict.Remove idNum
            idToCashDict.Add idNum, cashVal
        Else
            idToCashDict.Add idNum, cashVal
        End If

    Next i

    'then through second sheet, adding to cash value of
    'ids that have been seen before
    For i = 1 To UBound(t2Array)
        Dim idNum2 As String, cashVal2 As Double
        idNum2 = CStr(t2Array(i, 1))
        cashVal2 = CDbl(t2Array(i, 2))
        If idToCashDict.Exists(idNum2) Then
            cashVal2 = cashVal2 + idToCashDict.Item(idNum2)
            idToCashDict.Remove idNum2
            idToCashDict.Add idNum2, cashVal2
        Else
            idToCashDict.Add idNum2, cashVal2
        End If

    Next i


    'then write the entries from the dictionary to the
    'destination sheet
    Dim finalVal As Double, finalID As String
    i = 1
    For Each finalID In idToCashDict.Keys
        finalVal = idToCashDict.Item(finalID)
        finalTable.Range("A" & i).Value = finalID
        finalTable.Range("B" & i).Value = finalVal
        i = i + 1
    Next finalID


End Sub

If you use actual tables, see answers such as here in order to loop through the rows in a similar way.

Community
  • 1
  • 1
gg2104
  • 39
  • 6
  • You mention that I need to add a reference to Microsoft Scripting Runtime first. Is that included in this code? If not, how do I go about it. – Kyle Oct 24 '16 at 15:16
  • Also when I tried to use this code, it returned a Compile Error: User-defined type not defined for the "idToCashDict As Dictionary" section. Thank you for your help! – Kyle Oct 24 '16 at 15:20
  • @Kyle Yes, you are getting that error because the Microsoft Scripting Runtime reference hasn't been added yet. You can add the reference by following the directions [here](https://msdn.microsoft.com/en-us/library/office/gg264402.aspx) – gg2104 Oct 24 '16 at 16:02
  • @kyle also note that I addeda dim of finalID as string in case Option Explicit is on – gg2104 Oct 24 '16 at 16:06
  • This is far too much work for no good reason. Use an [SQL statement](http://stackoverflow.com/a/40276761/111794). – Zev Spitz Oct 27 '16 at 06:03
  • @ZevSpitz yes, I know that you can use SQL statements - the reason why I provided it this way is because I come across a lot of people who get that Microsoft.ACE.OLEDB.12.0 isn't registered, especially if they don't have Access installed on their computer – gg2104 Oct 27 '16 at 13:23
  • Isn't ACE installed along with Office, even without Access? In any case, the [drivers are readily available](https://www.microsoft.com/en-us/download/details.aspx?id=13255); the only wrinkle is matching the bittedness of the installed Office / Windows to the download. – Zev Spitz Oct 27 '16 at 13:42
  • @ZevSpitz it should be, but for some reason often isn't - see [here](http://stackoverflow.com/questions/6649363/microsoft-ace-oledb-12-0-provider-is-not-registered-on-the-local-machine) ... In this case, the user has to download and install, which people at work often don't have the ability to do due to restricted access from the company – gg2104 Oct 27 '16 at 14:36
  • @ZevSpitz I do agree it is the better answer though, just don't think it works for all people depending on their situation – gg2104 Oct 27 '16 at 14:36
  • The link you mentioned is about working with Excel/Access files in a C#/.NET project , which can be a pain, because Office might not be installed, and even if it is installed, the bit-ness of the project has to match the bit-ness of Office. But the OP appears to have Excel installed, and is willing to use a VBA solution, where this isn't an issue. – Zev Spitz Oct 31 '16 at 19:08
  • @ZevSpitz that's strange, because where I am, this error always gets thrown if the user doesn't have Access installed, which is a huge pain as most people here don't... Good to know that it might be another issue I guess, time to revisit that code... – gg2104 Oct 31 '16 at 21:26
-1

Here's an attempt at getting a sorted and combined table. The general strategy I've employed here is: make copies of existing tables and use them to add values, remove repetitive values, and do the same for the third combined table on sheet 3. Attach the following code to a command button.

Application.ScreenUpdating = False
Dim i As Long, x As Long, n As Long, j As Long
Dim cashtotal As Integer

lastrow1 = Sheet1.Range("A1048575").End(xlUp).Row
astrow2 = Sheet2.Range("A1048575").End(xlUp).Row
cashtotal = 0
x = 1

'''''Routine to make a copy of the existing data.
For i = 1 To lastrow1
    Sheet1.Cells(i, 4) = Sheet1.Cells(i, 1)
    Sheet1.Cells(i, 5) = Sheet1.Cells(i, 2)
Next

'''''On Sheet1- Routine to remove repetitive values
For i = 2 To lastrow1
    If Sheet1.Cells(i, 4) = "" Then GoTo 10
      x = x + 1
      cashtotal = Sheet1.Cells(i, 5)
      Sheet1.Cells(x, 7) = Sheet1.Cells(i, 4)
      Sheet1.Cells(x, 8) = Sheet1.Cells(i, 5)

        For j = i + 1 To lastrow1
           If Sheet1.Cells(j, 4) = Sheet1.Cells(i, 4) Then
             cashtotal = cashtotal + Sheet1.Cells(j, 5)
             Sheet1.Cells(x, 8) = cashtotal
             Sheet1.Cells(j, 4).ClearContents
             Sheet1.Cells(j, 5).ClearContents
           End If
        Next
10
Next
x = 1

'''''On Sheet2 the following routine makes a copy of the existing data
For i = 1 To lastrow2
    Sheet2.Cells(i, 4) = Sheet2.Cells(i, 1)
    Sheet2.Cells(i, 5) = Sheet2.Cells(i, 2)
Next

'''''On sheet2 -  Routine to remove repetitive values
For i = 2 To lastrow2
    If Sheet2.Cells(i, 4) = "" Then GoTo 20
       x = x + 1
       cashtotal = Sheet2.Cells(i, 5)
       Sheet2.Cells(x, 7) = Sheet2.Cells(i, 4)
       Sheet2.Cells(x, 8) = Sheet2.Cells(i, 5)
          For j = i + 1 To lastrow2
            If Sheet2.Cells(j, 4) = Sheet2.Cells(i, 4) Then
              cashtotal = cashtotal + Sheet2.Cells(j, 5)
              Sheet2.Cells(x, 8) = cashtotal
              Sheet2.Cells(j, 4).ClearContents
              Sheet2.Cells(j, 5).ClearContents
            End If
          Next
20
Next
x = 1

'''Transfer modified tables on sheet1 and sheet2 to sheet3 in a combined table
lastrow4 = Sheet1.Range("G1048575").End(xlUp).Row

For i = 1 To lastrow4
    Sheet3.Cells(i, 1) = Sheet1.Cells(i, 7)
    Sheet3.Cells(i, 2) = Sheet1.Cells(i, 8)
Next

lastrow5 = Sheet2.Range("G1048575").End(xlUp).Row
lastrow6 = Sheet3.Range("A1048575").End(xlUp).Row

For i = 2 To lastrow5
    Sheet3.Cells(lastrow6 + i - 1, 1) = Sheet2.Cells(i, 7)
    Sheet3.Cells(lastrow6 + i - 1, 2) = Sheet2.Cells(i, 8)
Next

'''''''Routine to make a copy of the existing table
lastrow7 = Sheet3.Range("A1048575").End(xlUp).Row

For i = 1 To lastrow7
    Sheet3.Cells(i, 4) = Sheet3.Cells(i, 1)
    Sheet3.Cells(i, 5) = Sheet3.Cells(i, 2)
Next

'''''''' Routine to remove repetitive values
For i = 2 To lastrow7
    If Sheet3.Cells(i, 4) = "" Then GoTo 30
      x = x + 1
      cashtotal = Sheet3.Cells(i, 5)
      Sheet3.Cells(x, 7) = Sheet3.Cells(i, 4)
      Sheet3.Cells(x, 8) = Sheet3.Cells(i, 5)
         For j = i + 1 To lastrow7
            If Sheet3.Cells(j, 4) = Sheet3.Cells(i, 4) Then
               cashtotal = cashtotal + Sheet3.Cells(j, 5)
               Sheet3.Cells(x, 8) = cashtotal

               Sheet3.Cells(j, 4).ClearContents
               Sheet3.Cells(j, 5).ClearContents
            End If
        Next
30
Next
Application.ScreenUpdating = True
  • This is far too much work for no good reason. Use an [SQL statement](http://stackoverflow.com/a/40276761/111794). – Zev Spitz Oct 27 '16 at 06:03