1

I'm using ADO txt/csv connection string below in vba to get data without me actually opening excel.

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FolderPath & ";Extended Properties="text;HDR=Yes;FMT=Delimited(,)";  

Dim rs As New ADODB.Recordset
query = "Select * from  [" & CStr(FileNameArr(i)) & "]"
rs.Open query, conn, adOpenKeyset, adLockOptimistic

I've successfully opened the connection and recordset, but the recordset only contained two fields which is not correct.(Should be 103). The strangest thing is that if I open file and save it as csv or txt without doing anything, and run vba again, it successfully gets all columns! The file is not actually a txt/csv file, it's a .msr file. But I can open it with Notepad or excel without any problem. Before executing vba, I deleted extension of .msr and change it to .csv, then execute VBA using ADO csv/txt connection string to pull out data, some columns will be missing as above statement.

The file content is not a strictly proper table, it has lesser header columns than that of data body. If I open it in csv, data in the first row only lives in Column A, but data in other rows contains at most 103 columns.

I also tried HDR=No but it didn't work.

I've tried several workarounds:
<1>.Try to save file via ADO:
Because I know manually open file and save it will solve the problem, I tried to add some text in first row then save file when recordset is opening.
I tried rs.field(2)="New Text" then rs.Update,
but an error message occurred at rs.field(2)="New Text" and error message popped up:
run time error 3251 current recordset does not support updating by this ISAM
I thought it's because the header only contains 2 columns, i.e field(0) and field(1), so I can't update something that is originally not existed there. So, I changed to rs.field(1)="New Text", but still same error message.
If I skip to rs.Update, it's ok. But I still got only two columns.
Thus, I tried rs.save Original file full path, adPersistADTG
This gives runtime error 58: file already exists.
If I cahnge to a non-existed file name, same error occurred.

<2>.try specify 3rd and 4th argument of rs.open
By googling, I found some said that if I want to change file content via ADO, i need to use

rs.Open query, conn, adOpenKeyset, adLockOptimistic

Some said rs.Open query should change to table_name
in this thread answered Marcelo Garzzola:
run time error 3251 current recordset does not support updating

rs.Open table_name, conn, adOpenKeyset, adLockOptimistic

But none of this work for me.

<3>.Try to open it in xls, xlsx, xlsm file extension
This approach failed early while opening connection, it showed:
Database error 0x80004005: External table is not in the expected format.
I use a sub-procedure in vba to open connection and recordset to the file

Sub OpenConnection(wb As Workbook, addr As String, FileExtensionName As String, IsUseNumberSequenceAsSheetName As Boolean, FullPathArr As Variant, FolderPath As String, FileNameArr As Variant)  
Dim conn As New ADODB.Connection
Dim sht As Worksheet
For i = 1 To UBound(FileNameArr) - LBound(FileNameArr) + 1
    Dim connStr As String: connStr = ADOstr(CStr(FullPathArr(i)), FolderPath, FileExtensionName)
    conn.Open connStr
    Dim query As String
    query = "Select * from  [" & CStr(FileNameArr(i)) & "]"
    Dim rs As New ADODB.Recordset
    rs.Open query, conn, adOpenKeyset, adLockOptimistic
    Dim asht As Worksheet: Set asht = wb.Worksheets.Add(, After:=ActiveSheet)
    If IsUseNumberSequenceAsSheetName Then
        asht.Name = i
    End If
    ReDim Header(0 To rs.Fields.Count - 1)
    With asht
        For h = 0 To rs.Fields.Count - 1
            Header(h) = rs.Fields(h).Name
        Next
        .Range(addr).Resize(1, rs.Fields.Count) = Header
        .Range(addr).Offset(1, 0).CopyFromRecordset rs
    End With
    rs.Close
    conn.Close
Next
End Sub

Above sub-procedure will call a ADOstr function:

Function ADOstr(FullPath As String, FolderPath As String, FileExtension As String) As String
Select Case True
    Case LCase(FileExtension) = "csv" Or LCase(FileExtension) = "txt"
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FolderPath & ";Persist Security Info=False;Extended Properties=""text;HDR=Yes;IMEX=0;FMT=Delimited(;)"";"
    Case LCase(FileExtension) = "xls"
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FullPath & ";Extended Properties=""Excel 8.0;HDR=No;IMEX=0"";"
    Case LCase(FileExtension) = "xlsx" 
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FullPath & ";Extended Properties=""Excel 12.0 Xml;HDR=YES;IMEX=1"";"
    Case LCase(FileExtension) = "xlsb"
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FullPath & "Extended Properties=""Excel 12.0;HDR=YES"";"
    Case LCase(FileExtension) = "xlsm" 
        ADOstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FullPath & ";Extended Properties=""Excel 12.0 Macro;HDR=YES;IMEX=1"";"
End Select
End Function

I will call OpenConnection sub-procedure in my main sub-procedure and define all parameters it needs, including file extension so that ADOstr function can return corresponding connection strings.

<4>.Try to replicate error in small csv file
I created a small data set in a csv file which contains header row that only A1 has value
and data body that have 5 columns.
It turns out ADO recordset exactly return correct 5 fields! So I guess maybe there's something strange in this .msr file which I can't see.
Due to proprietary concern, I can't upload the file.
But I could say that header row contains >ver MF01 00.00 in A1 cell
After reading from recordset, it becomes: >ver MF01 00#00 in A1 Cell, F2 in B1 Cell

<5> Try to open connection using strings with all kinds of delimiters
I've tried to change FMT=Delimited(;) to FMT=Delimited(,) to FMT=Delimited(@) to FMT=Delimited( ), I still get only two columns.
Google (below link) said in csv/txt format, I need a schema.ini file in the same folder, where stores the delimiters that program will actually use. And I tried, but the same. It still returns only two columns/fields no matter what delimiters I used.
ADODB.Connection: delimiter semicolon does not work for csv text files
But schema.ini file did work well with my small csv file I've created for debugging. It did separate columns by delimiter in schema.ini.

I would be very much appreciate if anyone could guide me
how to save a file via ADO or
how to open csv file without any delimiters such that all data live in ColumnA so recordset could retreive all data
or other workarounds.

  • Never heard of msr file extension. Might want to post in a forum that allows attaching files and provide that file for testing. – June7 Mar 03 '23 at 03:14
  • Sorry, I can't upload a complete file due to proprietary information protection terms. – jou yungcheng Mar 03 '23 at 04:55

2 Answers2

1

First see this, I've encountered a very, very similar issue: Missing column of Dates in CSV file using Schema.ini file

You want a standardized file format like CSV. If you do a SELECT * on a free text file it has no structure or schema.

Try again on a CSV file with these properties set:

Extended Properties="text;HDR=Yes;FMT=Delimited(Tab)";

Note: if the size of the MSR file as a CSV is larger don't worry too much as using the OleDB Driver is one of the fastest for .Net and it can read a 128 MB file in 13.5 seconds: https://stackoverflow.com/a/46418403/495455

Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321
  • 1
    Thanks Jeremy for you reply, I tried your code in ADOstr function and also in schema.ini. but fail at 【rs.Open query, conn, adOpenKeyset, adLockOptimistic】, error messages says: format option is invalid. But when I change format=Delimited(Tab) to format=TabDelimited in schema.ini, it works! Successfully extracting all data in one column! Many Thanks Jeremy. – jou yungcheng Mar 03 '23 at 04:33
  • So it seems that it is the delimiter that has gone wrong. Somehow it delimit raw data into only two columns, and rest of them are missing – jou yungcheng Mar 03 '23 at 04:37
  • Can you [edit] your question and put in a [mcve]? – Jeremy Thompson Mar 03 '23 at 04:38
  • I'll try that. I also want to see exactly which part is causing the error. – jou yungcheng Mar 03 '23 at 04:44
  • I found that if I open .msr file in csv format and save it without doing anything, file size becomes larger. Size grows from 680kb to 1800kb, almost trippled. And this tripple size file will work fine with ADO connection. I don't know if this has any hint... – jou yungcheng Mar 03 '23 at 05:08
  • It explains my answer, MSR is an unstructurred format so the OleDB Driver can't parse it - or all columns of it. The OleDB driver will work fine on a structured file with a consistent delimiter. See my last comment for reassurance about the File Size increase. – Jeremy Thompson Mar 03 '23 at 05:15
  • I see, this is why I can't find any difference between MSR before and after I save it. I use simple formula, A1(unsaved)=A1(saved), and expand to all cells, found out that all formula is TRUE. lol, so difference in data structure is hard to recognize. File size is a hint, though. – jou yungcheng Mar 03 '23 at 07:01
  • A text file is a text file regardless of extension. Don't open, save or or edit in excel it just confuses things. What are you actually trying to do with this text file? I see you are trying to open in VBA and do something in excel but what? You need to open the msr file in notepad (or better stil VS Code) and inspect it and make sur eit has some kind of delimiter where you expect it. – Nick.Mc May 17 '23 at 13:05
0

Try creating a class module and name it "CsvDB" with the code below. It will automatically create the schema.ini file and makes all columns text so you don't lose any data.

'''''''''''''''''Class module'''''''''''''''''''

'From toolbar, enable the following Tools->References->"Microsoft ActiveX Data Objects 6.1 Library"
Option Explicit
Option Compare Text

Private connectionStringHdr As String
Private conn As New ADODB.Connection
Private rs As New ADODB.Recordset

'this is fired when class is created e.g. Dim db As new CsvDB
Private Sub Class_Initialize()
    Dim filePath As String
   
    filePath = ThisWorkbook.Path & "\" 'uses the folder where the current workbook is located
    connectionStringHdr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & filePath & ";Extended Properties='text;HDR=YES;FMT=Delimited;';Mode=Read"
    conn.Open connectionStringHdr                   ' Open the connection
End Sub

'this is fired when class is set to nothing e.g. "set db = nothing"
Private Sub Class_Terminate()
    If rs.state = adStateOpen Then
        rs.Close
    End If

    Set rs = Nothing
   
    If conn.state = adStateOpen Then
        conn.Close
    End If
   
    Set conn = Nothing
End Sub

Public Function Query(sql As String) As ADODB.Recordset
    
    If rs.state = adStateOpen Then
        rs.Close
    End If
    
    '''''''''''''''''''''''''''''''''''''''''''''''''Create Schema'''''''''''''''''''''''''''''''''''''''''''''''''''''
    Dim filename As String, fso As New FileSystemObject, startIndex As Integer, endIndex As Integer, txtStream As TextStream, i As Integer, iniFile As String
    
    iniFile = ThisWorkbook.Path & "\Schema.ini" 'schema filepath
    If fso.FileExists(iniFile) Then 'delete the old schema in case you are querying a new csv
        fso.DeleteFile iniFile, True
    End If
    
    'find the filename from sql query
    startIndex = InStr(sql, "FROM [")
    startIndex = InStr(startIndex, sql, "[") + 1
    endIndex = InStr(startIndex, sql, "]")
    filename = Mid(sql, startIndex, endIndex - startIndex)

    'connection string has HDR=yes. we're selecting a single row, then will put the column names from recordset
    rs.Open "SELECT TOP 1 * FROM [" & filename & "] ", conn                            'populates recordset
    
    'create the schema.ini file
    Set txtStream = fso.CreateTextFile(iniFile, True)
    txtStream.WriteLine ("[" & filename & "]")
    txtStream.WriteLine ("ColNameHeader=True")
    txtStream.WriteLine ("MaxScanRows=0")
    txtStream.WriteLine ("Format=CSVDelimited")
    
    'make all columns data type text
    For i = 0 To rs.Fields.Count - 1
        txtStream.WriteLine ("Col" & i + 1 & "=""" & rs.Fields(i).Name & """ Text") 'passes the column name from recordset into the schema.ini file
    Next
    
    txtStream.Close
    Set txtStream = Nothing
    rs.Close
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    
    'query database
    rs.Open sql, conn                            ' Execute a SQL query
    Set Query = rs
    
    'delete ini file so people aren't wondering why a file appeared in the folder
    fso.DeleteFile iniFile, True
    Set fso = Nothing
End Function

''''''''''''''''''''''''''''''''''''''''''''''''

to use the class, try the following

Sub test()
    Dim csv As New CsvDB, rs As New Recordset, filename As String
    
    Set rs = csv.Query("SELECT * FROM [test.csv]") 'put the csv file in the same folder as the workbook you are running
    
    While Not rs.EOF                             'loops through the recordset until EOF (end of file i.e. loops through all the records/rows in query)
        Debug.Print rs.Fields("ColumnName1").value
        Debug.Print rs.Fields("ColumnName2").value
        rs.MoveNext                              'moves to next record/row
        
    Wend
    
    Set csv = Nothing
End Sub