What would be a way to open a RFC 4180
compliant csv
file in MS Excel
using VBA
?
- (2.6) Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
- (2.7) If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
For example the test.csv
file below:
code,description,comments,start_date,other_date
00333,"a""aa","bbb
ccc, ddd",12/04/2002 5:00:00 PM,1960/12/14
"00444","eee
""fff""
ggg, hhh","""
",6-08-2002 14:00:00,1950-09-01T05:30:05+13:00
And this is what I want to see in the spreadsheet where it gets imported:
Failed Attempts
Using Workbooks.Open
This is equivalent to opening the csv
with double-click.
' https://stackoverflow.com/a/25215194/4352306
Private Function copyCSVviaOpenWorkbook(strFile As String, wsDest As Excel.Worksheet) As Boolean
Dim wbCSV As Excel.Workbook, rnDest As Excel.Range
Dim aData As Variant, lRows As Long, lCols As Long
copyCSVviaOpenWorkbook = False
On Error GoTo copyCSVviaOpenWorkbookErr
Set wbCSV = Workbooks.Open(FileName:=strFile, Local:=True)
aData = wbCSV.Worksheets(1).UsedRange.Value
lRows = UBound(aData, 1)
lCols = UBound(aData, 2)
Set rnDest = wsDest.Range(wsDest.Cells(1, 1), wsDest.Cells(lRows, lCols))
rnDest.NumberFormat = "@"
rnDest.Value = aData
rnDest.EntireColumn.AutoFit
wbCSV.Close
copyCSVviaOpenWorkbook = True
copyCSVviaOpenWorkbookErr:
End Function
It does keeps correctly double-quoted content in the same row (reference), but applies numeric and date conversions with unintended consequences. Although Text format (@
) was applied to cells, the values were already altered by the Open
method:
- Content that can be read as a number lost the leading zeroes
- See column
code
: originally"00444"
becomes444
- See column
- Content that reads as a date have been altered
1960/12/14
becomes12/14/1960
(date format change)12/04/2002 5:00:00 PM
becomes4/12/2002 5:00:00 PM
(a different date, where day and month have been exchanged)
From this point on trying to find a way to prevent the auto-conversion leads to some dead ends (this old post in stackoverflow reflects that this has been a well shared concern among Excel users).
Using Worksheet.QueryTables.Add
It does offers lots of parameters to adjust things like the column types, delimiters, etc.
Private Function copyCSVviaQueryTable(strFile As String, wsDest As Excel.Worksheet, _
Optional Delimiter As String = ",", Optional numColumns As Integer = 10) As Boolean
Dim strQueryTable As String, columnDataTypes(), I As Integer
On Error GoTo copyCSVviaQueryTableErr
copyCSVviaQueryTable = False
numColumns = IIf(numColumns <= 0, 10, numColumns)
For I = 0 To (numColumns - 1)
ReDim Preserve columnDataTypes(I)
columnDataTypes(I) = xlTextFormat
Next
strQueryTable = "ImportedCSV"
With wsDest.QueryTables.Add(Connection:="TEXT;" & strFile, Destination:=wsDest.Range("$A$1"))
'.QueryType = xlTextImport
.name = strQueryTable
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
'.PreserveFormatting = False
.PreserveFormatting = True
'.TextFilePlatform = xlMSDOS
'.TextFilePlatform = 437
.TextFilePlatform = 1252
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileConsecutiveDelimiter = False
'.TextFileTextQualifier = xlTextQualifierNone
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileCommaDelimiter = (Trim(Delimiter) = vbNullString) Or (InStr(1, Delimiter, ",", vbTextCompare) > 0)
.TextFileTabDelimiter = (InStr(1, Delimiter, "|", vbTextCompare) > 0)
.TextFileSemicolonDelimiter = (InStr(1, Delimiter, ";", vbTextCompare) > 0)
.TextFileSpaceDelimiter = (InStr(1, Delimiter, " ", vbTextCompare) > 0)
.TextFileColumnDataTypes = columnDataTypes
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
.Delete
End With
copyCSVviaQueryTable = True
copyCSVviaQueryTableErr:
End Function
It somehow supports the delimiter, the qualifier and respects the data type (no numeric or date conversions observed). However it does not seem compliant with the RC 4180, as correctly double-quoted line breaks split data of the same row into multiple rows.
Using Workbooks.OpenText
It is the programmatic way to do what the Text Import Wizard does. So, it could somehow offer a way. However, it has an issue with the FieldInfo
parameter, and it does NOT respect the pre-configured data types (reference).
When the data is opened in the new workbook, the date and numeric data conversions have already happened; for which it somehow gives the same result as Workbooks.Open
.
Range.TextToColumns
(equivalent to the Text to Columns Wizard) seems to have this very same problem with theFieldInfo
parameter as well.