0

I'm trying to implement a SQL Group By statement in VBA. I have a simple worksheet with 4 columns (login_year, login_month, login_day, user_name) and I want to produce a count of unique users who logged in over each month.

So the Oracle SQL would be:

SELECT 
login_year, 
login_month, 
COUNT(DISTINCT user_name) 
FROM users
GROUP BY 
login_year, 
login_month;

How would it be best to implement this using VBA?

Many thanks for your help.

Community
  • 1
  • 1
Mikeyjh
  • 1
  • 1

2 Answers2

0

You need to set a reference to the MS Scripting Library to have access to the Dictionary object.

The below code creates a dictionary entry for each year/month. Each entry then holds another dictionary for the names. The count of the child dictionary is the distinct count.

Sub ListDistinctUserCount()

    Dim vaValues As Variant
    Dim dc As Scripting.Dictionary
    Dim dcNames As Scripting.Dictionary
    Dim i As Long
    Dim sAllData As String

    'read range into array
    vaValues = Sheet1.Range("A2:D51").Value
    Set dc = New Scripting.Dictionary

    For i = LBound(vaValues, 1) To UBound(vaValues, 1)
        'concat the year and month for dictionary key
        sAllData = vaValues(i, 1) & "|" & vaValues(i, 2)

        'if it's already added
        If dc.Exists(sAllData) Then
            'add the name if it doesn't exists
            If Not dc.Item(sAllData).Exists(vaValues(i, 4)) Then
                dc.Item(sAllData).Add vaValues(i, 4), vaValues(i, 4)
            End If
        Else
            'otherwise add the year month and a new dict for the names
            Set dcNames = New Scripting.Dictionary
            dcNames.Add vaValues(i, 4), vaValues(i, 4)
            dc.Add sAllData, dcNames
        End If
    Next i

    For i = 0 To dc.Count - 1
        Debug.Print dc.Keys(i), dc.Items(i).Count
    Next i

End Sub

Not very elegant, but I couldn't think of a better way.

Additional Information for VBA Novices

A Variant data type can hold any other data type. It's large and the least efficient, but is necessary in some cases.

The Range.Value property of a multi-cell range returns a two-dimensional, one-based array. You can assign that array to a Variant and loop through it. This is a lot faster than reading individual cells. Reading and writing to the Excel grid are notoriously slow, so it's a common practice to read a whole range into an array and work with the data that way.

VBA has a Collection object for storing data. The Scripting.Dictionary object has a few features that make it superior such as the .Exists property and writing all the keys or items out as an array. The Scripting library is installed with Office by default, but is not included in new projects by default. You have to go to Tools - References ton include it.

Dictionaries are key-item lists, like you find in many other languages. The item can be almost any data type. In this case, the item is another dictionary.

Dick Kusleika
  • 32,673
  • 4
  • 52
  • 73
  • Thanks Dick for your reply. This is interesting, I'll need to do some background reading to understand your logic. VBA novice here :). – Mikeyjh Jul 11 '16 at 19:25
0

While the Jet/ACE SQL Engine (used in MS Office/Windows programs including Access, Excel) dialect does not maintain COUNT(DISTINCT ...) compared to other RDMS, you can use a derived table to first return unique users by month and year and then count them in the outer query.

SQL (embedded as string below)

SELECT u.login_year, u.login_month, COUNT(*) As usercount
FROM 
   (SELECT user_name, login_year, login_month 
    FROM users
    GROUP BY user_name, login_year, login_month) As u
GROUP BY u.login_year, u.login_month;

VBA (ADO Driver/Provider connection)

Sub RunSQL()    
    Dim conn As Object, rst As Object
    Dim strConnection As String, strSQL As String
    Dim i As Integer

    Set conn = CreateObject("ADODB.Connection")
    Set rst = CreateObject("ADODB.Recordset")

    ' CONNECTION STRINGS (TWO VERSIONS)
'    strConnection = "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
'                      & "DBQ=C:\Path\To\Workbook.xlsm;"
    strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;" _
                       & "Data Source='C:\Path\To\Workbook.xlsm';" _
                       & "Extended Properties=""Excel 8.0;HDR=YES;"";"

    strSQL = " SELECT u.login_year, u.login_month, COUNT(*) As userCount " _
                & " FROM " _
                & "   (SELECT user_name, login_year, login_month " _
                & "    FROM users" _
                & "    GROUP BY user_name, login_year, login_month) As u" _
                & " GROUP BY u.login_year, u.login_month;"                    
    ' OPEN DB CONNECTION
    conn.Open strConnection
    rst.Open strSQL, conn

    ' COLUMN HEADERS
    For i = 1 To rst.Fields.Count
        Worksheets("RESULTS").Cells(1, i) = rst.Fields(i - 1).Name
    Next i        
    ' DATA ROWS
    Worksheets("RESULTS").Range("A2").CopyFromRecordset rst

    rst.Close: conn.Close    
    Set rst = Nothing: Set conn = Nothing    
End Sub
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks Parfait this is what I was looking for! Can you recommend a book or blog to give me some background? – Mikeyjh Jul 11 '16 at 19:22
  • Google is your friend. See this Microsoft tutorial on [workbook queries](https://technet.microsoft.com/en-us/library/ee692882.aspx). – Parfait Jul 11 '16 at 22:51