1

Problem Description: When Joining 2 Tables using Excel and ADO (ACE OLEDB 12.0) on alphanumeric Strings, ADO does not distinguish between the Keys "a12a" and "A12a" (it treats them as if they would be the same, i.e. case-insensitive). I, however, have alphanumeric keys in my data. The Join will link the data wrongly!

I built a small example in an Excel Workbook to reproduce the behavior. The Excel Workbook contains 3 Sheets:

  1. AlphaNum1
  2. AlphaNum2
  3. Result

AlphaNum1 Sheet contains the following data

Key   Val
a12b    1
A12b    2
a12B    3
A12B    4
e12f    7
E12F    8
1234    9

AlphaNum2 Sheet contains the following data:

Key   Val
a12b    1
A12b    2
a12B    3
A12B    4
c12d    5
C12D    6
1234    9

I Use the following VBA code to connect to ADO and join the tables (LEFT JOIN):

Sub AlphaNumTest()
    Dim oAdoConnection As New ADODB.Connection
    Dim oAdoRecordset As New ADODB.Recordset
    Dim sAdoConnectString As String, sPfad As String
    Dim sQuery As String
    On Error GoTo ExceptionHandling
    sPfad = ThisWorkbook.FullName
    sAdoConnectString = "Provider=Microsoft.ACE.OLEDB.12.0; Extended Properties='Excel 12.0 Xml;HDR=YES;';Data Source=" & sPfad

    oAdoConnection.Open sAdoConnectString
    sQuery = "Select a1.[Key], a2.[Val] from [AlphaNum1$] a1 LEFT JOIN [AlphaNum2$] a2 ON a1.[Key] = a2.[Key]"
    With oAdoRecordset
        .Source = sQuery
        .ActiveConnection = oAdoConnection
        .Open
    End With

    Dim writeRange As Range
    Dim headerRange As Range

    'Set headerRange = ThisWorkbook.Sheets("WriteHere").Range("A1")
    Set writeRange = ThisWorkbook.Sheets("Result").Range("A2")

    ' print the table header from recordset
    For i = 0 To oAdoRecordset.Fields.Count - 1
        ' careful! the recordset is zero-indexed like it should be! Excel table however starts at index one, thus the i+1~
        ThisWorkbook.Sheets("Result").Cells(1, i + 1).Value = oAdoRecordset.Fields(i).Name
        ' set bold
        ThisWorkbook.Sheets("Result").Cells(1, i + 1).Font.Bold = True
    Next i

    ' print the data directly from recordset!
    writeRange.CopyFromRecordset oAdoRecordset


CleanUp:
    On Error Resume Next ' Lazy skip
    oAdoRecordset.Close
    oAdoConnection.Close
    Set oAdoRecordset = Nothing
    Set oAdoConnection = Nothing
    Exit Sub
ExceptionHandling:
    MsgBox "Fehler: " & Err.Description
    Resume CleanUp
End Sub

Note that it does not matter if I use an INNER or a LEFT JOIN; the result is wrong eiter way - in this example here I use a LEFT JOIN to demonstrate the behavior.

The Result output is AlphaNum1.Key and AlphaNum2.Val joined by LEFT JOIN.

The Expected Result (I am joining using "=" not LIKE...) is:

Key   Val
a12b    1
A12b    2
a12B    3
A12B    4
e12f    
E12F    
1234    9

But ADO gives me the Actual Result (it treats the Keys case insensitive...):

Key   Val
a12b    4
a12b    3
a12b    2
a12b    1
A12b    4
A12b    3
A12b    2
A12b    1
a12B    4
a12B    3
a12B    2
a12B    1
A12B    4
A12B    3
A12B    2
A12B    1
e12f    
E12F    
1234    9

Any ideas why ADO behaves like this? Any ideas how/if I can change the behavior?

Erik A
  • 31,639
  • 12
  • 42
  • 67
dakami
  • 319
  • 2
  • 10
  • This has already been asked and answered [here](http://stackoverflow.com/questions/29533346/join-2-tables-case-sensitive-upper-and-lower-case) – leowyn Mar 30 '16 at 20:32
  • Thanks for the reply - but how do I solve the problem using ADO? When I add the collation (using COLLATE latin1_bin) I get an error when opening the RecordSet. Using BINARY I get a syntax error. Is there a different key word in ADO? – dakami Mar 30 '16 at 20:41
  • Hmm, my bad, I didn't read your question closely enough. The problem is not with ADO (ADO is just the go-between), it's with the database engine. From what I can find, the ACE OLEDB in Excel doesn't support COLLATE. You seem to have found your own solution :) – leowyn Mar 30 '16 at 20:56
  • The solution works, but it really brings the performance down... Well, can't help it I guess ;) – dakami Mar 30 '16 at 21:03

3 Answers3

4

I found a workaround for ADO. Seems COLLATE does not exist (see: http://www.utteraccess.com/forum/Collate-Access-t1940463.html).

One can use StrComp and set it to binary compare:

sQuery = "Select a1.[Key], a2.[Val] from [AlphaNum1$] a1 LEFT JOIN [AlphaNum2$] a2 **ON StrComp(a1.[Key], a2.[Key], 0)=0**"

If there is a better solution I am happy for more suggestions :)

gofr1
  • 15,741
  • 11
  • 42
  • 52
dakami
  • 319
  • 2
  • 10
  • That's how to do it. – Gustav Mar 31 '16 at 06:13
  • Is there any way to do this in a manner that does not bring the performance down so much? I have quite a lot of data to process and with StrComp it takes hours now (what took seconds before). – dakami Mar 31 '16 at 07:55
1

You could add a new indexed text field to each table, then fill this with values using the function below, and then join the tables the normal way using these fields:

Public Function StrToByte(ByVal strChars As String) As String

  Dim abytChar()  As Byte
  Dim lngChar     As Long
  Dim strByte     As String

  abytChar() = StrConv(strChars, vbFromUnicode)

  strByte = Space(2 * (1 + UBound(abytChar) - LBound(abytChar)))
  For lngChar = LBound(abytChar) To UBound(abytChar)
    Mid(strByte, 1 + 2 * lngChar) = Hex(abytChar(lngChar))
  Next

  StrToByte = strByte

End Function

It will produce keys like:

StrToByte("a12b") -> 61313262
StrToByte("A12b") -> 41313262
Gustav
  • 53,498
  • 7
  • 29
  • 55
  • Nice idea, thanks. I understand that the function converts the String to Byte but I don't really understand the details. How reliable is it? Is there a possibility that it produces the same result for a different string? – dakami Mar 31 '16 at 11:14
  • No, it converts to a byte array which is writes out as a string using the hex representation of each character. I can't see why you think these shouldn't be unique. – Gustav Mar 31 '16 at 11:26
-1

Just to report that i have reproduced your example, and got the 'Expected Result' at first time, without any modification in your code

I'm using Excel 2007 and ADO 2.8

robertocm
  • 124
  • 6