1

I'm trying to make SQL query on multiple Excel worksheets.

I have setup like this (each table is a separate worksheet): enter image description here

Three worksheets - one with condition, second with source data and third with the output.

In the Output I need all records from the Source where value in the ColumnA is present in the Condition next to a positive number.

I was trying to do something like this:

Select * 
From [Source$] 
Where [ColumnA] In (
    Select [Column1] 
    From [Condition$] 
    Where [Column2] > 0
)

First problem occurred because of blank values in the Condition table. It can not handle number comparison if there are blank values in that column - Data type mismatch in criteria expression. For now I dealt with it by doing this ugly thing: Where Column2 <> '' And Column2 <> '0'

But that's not the main question. I have bigger problem and it is has something to do with nesting one Select inside another. Even though inner Select returns single column with values like this:

enter image description here

I'm still getting error Type mismatch in criteria expression. (slightly different error than above).

So I can run inner expression just fine. I can run outer expression with hard coded values also just fine (for example Where ColumnA In "'value1','value2',..."). But when I'm nesting them I'm getting the error. To my SQL knowledge I think this query should work, but for some reason inner query in excel returns data that is not compatible with in operator.

This is my full code (thanks to this SO answer):

Option Explicit
Private Const adCmdText As Long = 1
Private Const adStateOpen As Long = 1

Public Sub DisplayView()
    Dim dbField       As Variant
    Dim fieldCounter  As Long
    Dim dbConnection  As Object
    Dim dbRecordset   As Object
    Dim dbCommand     As Object
    Dim OutputSheet   As Excel.Worksheet

    Set dbConnection = CreateObject("ADODB.Connection")
    Set dbRecordset = CreateObject("ADODB.Recordset")
    Set dbCommand = CreateObject("ADODB.Command")

    Set OutputSheet = ThisWorkbook.Worksheets("Output")

    'Do a quick check to determine the correct connection string
    'if one of these don't work, have a look here --> https://www.connectionstrings.com/excel/
    If Left$(ThisWorkbook.FullName, 4) = "xlsm" Then
        dbConnection.connectionstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & _
        ThisWorkbook.FullName & ";Extended Properties='Excel 12.0 Macro;HDR=YES';"
    Else
        dbConnection.connectionstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & _
        ThisWorkbook.FullName & ";Extended Properties='Excel 12.0;HDR=YES';"
    End If

    'Open the connection and query
    dbConnection.Open
    With dbCommand
        .ActiveConnection = dbConnection
        .CommandType = adCmdText
        .CommandText = "Select * From [Source$] Where [ColumnA] In (Select [Column1] from [Condition$] where [Column1] > 0)"
        Set dbRecordset = .Execute
    End With

    'Clear the Output Sheet
    OutputSheet.Cells.Clear

    'Add Headers to output
    For Each dbField In dbRecordset.Fields
        fieldCounter = fieldCounter + 1
        OutputSheet.Cells(1, fieldCounter).Value2 = dbField.Name
    Next

    'Dump the found records
    OutputSheet.Range("A2").CopyFromRecordset dbRecordset
    If dbConnection.State = adStateOpen Then dbConnection.Close
End Sub


'Run from here
Public Sub ExampleRunner()
    Dim t As Double
    t = Timer
    DisplayView
    Debug.Print "Getting data took: " & Timer - t & " seconds"
End Sub

AnJ
  • 581
  • 1
  • 6
  • 29

1 Answers1

1

Possibly the NumberFormat of Column2 in Condition sheet is set to General or Text. Try re-formatting entire column (not subset range) as Number and then save changes so SQL engine recognizes the data type as number. From there, you can run your SQL query with IN clause or better yet, use INNER JOIN:

SELECT s.[ColumnA], s.[ColumnB], s.[ColumnC], s.[ColumnD] 
FROM [Source$] s
WHERE s.[ColumnA] IN (
    SELECT [Column1] 
    FROM [Condition$] 
    WHERE [Column2] > 0
)

SELECT s.[ColumnA], s.[ColumnB], s.[ColumnC], s.[ColumnD] 
FROM [Source$] s
INNER JOIN [Condition$] c
  ON s.[ColumnA] = c.[Column1]
WHERE c.[Column2] > 0

Additionally, consider the Excel ODBC driver that handles any format (.xls, .xlsx, .xlsm, .xlsb) and avoid use of ADO command object if no parameters are required:

Public Sub DisplayView()
    Dim dbConnection  As Object, dbRecordset As Object
    Dim strSQL        As String
    Dim dbField       As Variant, fieldCounter  As Long

    Set dbConnection = CreateObject("ADODB.Connection")
    Set dbRecordset = CreateObject("ADODB.Recordset")

    ' CONNECTION WITH EXCEL ODBC DRIVER
    dbConnection.Open "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
                          & "DBQ=" & ThisWorkbook.FullName & ";"

    strSQL = "SELECT s.[ColumnA], s.[ColumnB], s.[ColumnC], s.[ColumnD]" _
               & " FROM [Source$] s" _
               & " INNER JOIN [Condition$] c" _
               & "   ON s.[ColumnA] = c.[Column1]" _
               & " WHERE c.[Column2] > 0"

    ' OPEN RECORDSET
    dbRecordset.Open strSQL, dbConnection

    With Worksheets("Output")
        ' HEADERS
        For Each dbField In dbRecordset.Fields
            fieldCounter = fieldCounter + 1
            .Cells(1, fieldCounter).Value = dbField.Name
        Next dbField    
        ' DATA ROWS
        .Range("A2").CopyFromRecordset dbRecordset
    End With

    dbRecordset.Close: dbConnection.Close
    Set dbRecordset = Nothing: Set dbConnection = Nothing
End Sub
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Inner Join did the job. And it is even faster now. Thanks! – AnJ Apr 14 '20 at 20:37
  • And about the part with a comparison - sadly it does not work. – AnJ Apr 14 '20 at 20:39
  • No no. I meant this part: "Possibly the NumberFormat of Column2 in Condition sheet is set to General or Text". The comparison `Where [Column2] > 0` does not work because of empty fields in a column and changing the field type didn't help. – AnJ Apr 14 '20 at 22:06
  • You may have some cells with empty spaces, invisible line breaks, or alpha character that cannot fully convert to number. – Parfait Apr 14 '20 at 22:41
  • Is there any way to deal with it inside VBA? – AnJ Apr 15 '20 at 07:15
  • 1
    Without knowing the nature of the problem, no way to tell. This is a data cleaning issue (also one of the reasons to use a *true* database like Excel's sibling, MS Access that does not allow mixing of types in same column). You can try the [conversion function](https://support.office.com/en-us/article/type-conversion-functions-8ebb0e94-2d43-4975-bb13-87ac8d1a2202) in the SQL query: `WHERE CDbl(Trim(c.[Column2])) > 2`. – Parfait Apr 15 '20 at 14:48
  • After some research I came up with similar solution `Cdbl(IIf(IsNull(c.[Column2]), 0, c.[Column2])) > 0`. But for some reason Inner join now broke because suddenly one of the column is recognized as Double instead of String. This is really frustrating how Excel changes recognized data type... – AnJ Apr 15 '20 at 14:59