0

What would be a way to open a RFC 4180 compliant csv file in MS Excel using VBA?

  • (2.6) Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
  • (2.7) If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

For example the test.csv file below:

code,description,comments,start_date,other_date
00333,"a""aa","bbb
ccc, ddd",12/04/2002 5:00:00 PM,1960/12/14
"00444","eee 
""fff""
ggg, hhh","""
",6-08-2002 14:00:00,1950-09-01T05:30:05+13:00

And this is what I want to see in the spreadsheet where it gets imported:

Desired result

Failed Attempts

Using Workbooks.Open

This is equivalent to opening the csv with double-click.

' https://stackoverflow.com/a/25215194/4352306
Private Function copyCSVviaOpenWorkbook(strFile As String, wsDest As Excel.Worksheet) As Boolean
Dim wbCSV As Excel.Workbook, rnDest As Excel.Range
Dim aData As Variant, lRows As Long, lCols As Long

 copyCSVviaOpenWorkbook = False
 On Error GoTo copyCSVviaOpenWorkbookErr

 Set wbCSV = Workbooks.Open(FileName:=strFile, Local:=True)

 aData = wbCSV.Worksheets(1).UsedRange.Value
 lRows = UBound(aData, 1)
 lCols = UBound(aData, 2)

 Set rnDest = wsDest.Range(wsDest.Cells(1, 1), wsDest.Cells(lRows, lCols))
 rnDest.NumberFormat = "@"
 rnDest.Value = aData
 rnDest.EntireColumn.AutoFit

 wbCSV.Close
 copyCSVviaOpenWorkbook = True
copyCSVviaOpenWorkbookErr:
End Function

Workbooks.Open result

It does keeps correctly double-quoted content in the same row (reference), but applies numeric and date conversions with unintended consequences. Although Text format (@) was applied to cells, the values were already altered by the Open method:

  • Content that can be read as a number lost the leading zeroes
    • See column code: originally "00444" becomes 444
  • Content that reads as a date have been altered
    • 1960/12/14 becomes 12/14/1960 (date format change)
    • 12/04/2002 5:00:00 PM becomes 4/12/2002 5:00:00 PM (a different date, where day and month have been exchanged)

From this point on trying to find a way to prevent the auto-conversion leads to some dead ends (this old post in stackoverflow reflects that this has been a well shared concern among Excel users).

Using Worksheet.QueryTables.Add

It does offers lots of parameters to adjust things like the column types, delimiters, etc.

Private Function copyCSVviaQueryTable(strFile As String, wsDest As Excel.Worksheet, _
    Optional Delimiter As String = ",", Optional numColumns As Integer = 10) As Boolean
    
Dim strQueryTable As String, columnDataTypes(), I As Integer

 On Error GoTo copyCSVviaQueryTableErr
 copyCSVviaQueryTable = False
 
 numColumns = IIf(numColumns <= 0, 10, numColumns)
 
 For I = 0 To (numColumns - 1)
    ReDim Preserve columnDataTypes(I)
    columnDataTypes(I) = xlTextFormat
 Next
 
 strQueryTable = "ImportedCSV"
 With wsDest.QueryTables.Add(Connection:="TEXT;" & strFile, Destination:=wsDest.Range("$A$1"))
    '.QueryType = xlTextImport
    .name = strQueryTable
    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .RefreshOnFileOpen = False
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .TextFilePromptOnRefresh = False
    '.PreserveFormatting = False
    .PreserveFormatting = True
    '.TextFilePlatform = xlMSDOS
    '.TextFilePlatform = 437
    .TextFilePlatform = 1252
    .TextFileStartRow = 1
    .TextFileParseType = xlDelimited
    .TextFileConsecutiveDelimiter = False
    '.TextFileTextQualifier = xlTextQualifierNone
    .TextFileTextQualifier = xlTextQualifierDoubleQuote
    .TextFileCommaDelimiter = (Trim(Delimiter) = vbNullString) Or (InStr(1, Delimiter, ",", vbTextCompare) > 0)
    .TextFileTabDelimiter = (InStr(1, Delimiter, "|", vbTextCompare) > 0)
    .TextFileSemicolonDelimiter = (InStr(1, Delimiter, ";", vbTextCompare) > 0)
    .TextFileSpaceDelimiter = (InStr(1, Delimiter, " ", vbTextCompare) > 0)
    .TextFileColumnDataTypes = columnDataTypes
    .TextFileTrailingMinusNumbers = True
    .Refresh BackgroundQuery:=False
    .Delete
 End With
 
 copyCSVviaQueryTable = True
copyCSVviaQueryTableErr:
End Function

Result with Query Table

It somehow supports the delimiter, the qualifier and respects the data type (no numeric or date conversions observed). However it does not seem compliant with the RC 4180, as correctly double-quoted line breaks split data of the same row into multiple rows.

Using Workbooks.OpenText

It is the programmatic way to do what the Text Import Wizard does. So, it could somehow offer a way. However, it has an issue with the FieldInfo parameter, and it does NOT respect the pre-configured data types (reference).

When the data is opened in the new workbook, the date and numeric data conversions have already happened; for which it somehow gives the same result as Workbooks.Open.

rellampec
  • 698
  • 6
  • 22
  • 3
    This question belongs on [su], as it is not programming related. We do accept Excel questions that are not programming related that deal with complex formulas, but other non-programming questions are off-topic here. – Ken White Apr 04 '23 at 23:54
  • 2
    In my opinion, you should write your own CSV handler, [as I describe here](https://stackoverflow.com/a/8670362/445425) – chris neilsen Apr 05 '23 at 00:14
  • @KenWhite I understand you guys are busy reviewing questions. However, if you read a bit more, you will see I do refer to programmatic in my question, even referring to VBA API docs. If you want I can reformulate the question adding _programmatically_ in the title and in the question. In fact, I do have the code that answers the question and wanted to post the answer where it can be easier found than in the 100 split threads where I could not find the same exact question and answer (in stackoverflow space btw). – rellampec Apr 05 '23 at 00:39
  • @KenWhite Nevermind, I will answer the question below anyway. Thanks for your feedback – rellampec Apr 05 '23 at 01:04
  • Use Power Query. Available in Windows Excel 2010+ and Microsoft 365 – Ron Rosenfeld Apr 05 '23 at 03:08
  • 1
    Sorry, but your question is not about programming. Adding a *VBA* tag to a non-programming question does not magically convert it. If you have a **specific programming related question** about Excel VBA, please ask it. – Ken White Apr 05 '23 at 03:13
  • @KenWhite I will, soon enough. I will reorganize the question and add snippets to the two major challenges with the VBA approaches mentioned. Just wanted to keep the question part clean. Don't take me wrong.... I know this is not meta. But it would be awesome if we could draft a question and only publish when we got it ready. It takes time to do this stuff well done. And perhaps there wouldn't be unnecessary false alarms. – rellampec Apr 05 '23 at 06:30
  • @KenWhite question refactored. Please feel free to reopen if you think it now meets the requirements and could be of interest. I got the answer and want to share with others. Up to you! – rellampec Apr 05 '23 at 11:46

1 Answers1

0

As noted in the question, there is NO simple way to achieve that. The solution proposed here is based on one surprising answer on how to import a CSV with line breaks to Excel.

The target steps to implement using VBA are:

  1. Open the csv file in a text editor, select all content, and copy (to the clipboard)
  2. Data tab in the ribbon, and click Text to Columns (follow the wizard: Delimited, Comma only, Text to all columns, and finish)
  3. Select all cells, press del key to clear all content, and right click -> Format cells, select Text (you can use the dropdown of the ribbon for that too)
  4. Paste again the content (of the clipboad) to the first cell, click Wrap Text

Main function

Here the implementation of the steps above in VBA (comments added to code):

' Programmatic implementation of this work around: https://stackoverflow.com/a/18849855/4352306
Private Function copyCSVviaTextToColumns(strFile As String, wsDest As Excel.Worksheet, _
    Optional Delimiter As String = ",") As Boolean
    
Dim rnArea As Excel.Range, rnCsv As Excel.Range, rnToBlank As Excel.Range
Dim strContent As String

 copyCSVviaTextToColumns = False
 On Error GoTo copyCSVviaTextToColumnsErr

 ' Copy each file line into the first column of its own row
 If fileLinesToSheet(strFile, wsDest) Then
    Set rnArea = wsDest.UsedRange
    
    ' We apply the TextToColumns desired formatting to rnArea
    rnArea.TextToColumns Destination:=rnArea, _
      DataType:=xlDelimited, _
      TextQualifier:=xlTextQualifierDoubleQuote, _
      ConsecutiveDelimiter:=False, _
      Comma:=(Trim(Delimiter) = vbNullString) Or (InStr(1, Delimiter, ",", vbTextCompare) > 0), _
      Tab:=(InStr(1, Delimiter, "|", vbTextCompare) > 0), _
      Semicolon:=(InStr(1, Delimiter, ";", vbTextCompare) > 0), _
      Space:=(InStr(1, Delimiter, " ", vbTextCompare) > 0), _
      Other:=False, _
      TrailingMinusNumbers:=True
      'FieldInfo ignores types (bug: https://stackoverflow.com/q/55151207/4352306)

    Set rnArea = wsDest.UsedRange
    
    ' Capture the CSV file content in the Clipboard
    strContent = FSOReadFromTextFile(strFile)
    Clipboard strContent
    
    ' Fix the type format of the cells to Text (see FieldInfo bug further up)
    ' https://stackoverflow.com/a/48868559/4352306
    rnArea.NumberFormat = "@"
    rnArea.Value = vbNullString
    
    ' About to paste where we have TextToColumns applied
    rnArea.Activate
    rnArea.Columns(1).Rows(1).Select
    
    Application.Wait (Now + TimeValue("0:00:01")) 'https://stackoverflow.com/a/31255548/4352306
    ' We fix  the wrong line breaks !!!
    wsDest.Paste
    ' A couple of adjustments
    Set rnArea = wsDest.UsedRange
    rnArea.EntireColumn.AutoFit
    rnArea.WrapText = True
    
    copyCSVviaTextToColumns = True
 End If
 
copyCSVviaTextToColumnsErr:
 If Err.Number <> 0 Then
   MsgBox Err.Description, vbOKOnly
 End If
End Function

A test unit

Using the test.csv RFC 4180 compliant file of the question:

code,description,comments,start_date,other_date
00333,"a""aa","bbb
ccc, ddd",12/04/2002 5:00:00 PM,1960/12/14
"00444","eee 
""fff""
ggg, hhh","""
",6-08-2002 14:00:00,1950-09-01T05:30:05+13:00

The execution generates the below content on the destination worksheet:

the result

  • Cell content has been preserved as originally is in the csv file (no date or numeric conversions observed)
  • Correctly double-quoted break lines do not split a csv row into multiple excel rows (they are kept in the cell/data-point they belong to).

Dependencies

Out of completeness, added below the dependencies used in the main solution.

Clipboard

' https://stackoverflow.com/a/60896244/4352306
' MSForms.DataObject DEPRECATED (https://stackoverflow.com/a/25336423/4352306)
Private Function Clipboard$(Optional s$)
Dim v: v = s  'Cast to variant for 64-bit VBA support
 With CreateObject("htmlfile")
    With .parentWindow.clipboardData
        Select Case True
            Case Len(s): .SetData "text", v
            Case Else:   Clipboard = .GetData("text")
        End Select
    End With
 End With
End Function

fileLinesToSheet

' Copies the lines of the file to the first column (one row per each file line)
Private Function fileLinesToSheet(strFile As String, wsDest As Excel.Worksheet) As Boolean
Dim strTextLine As String, rnDest As Excel.Range
Dim iFile As Integer: iFile = FreeFile

 fileLinesToSheet = False
 On Error GoTo fileLinesToSheetErr

 Set rnDest = wsDest.Range("A1")
 
 Open strFile For Input As #iFile
 Do Until EOF(1)
    Line Input #1, strTextLine
    rnDest = strTextLine
    Set rnDest = rnDest.Offset(1)
 Loop
 Close #iFile
 fileLinesToSheet = True
fileLinesToSheetErr:
End Function

FSOReadFromTextFile

' Read text file content
Private Function FSOReadFromTextFile(strFile As String) As String
Const ForReading = 1 ', ForWriting = 2, ForAppending = 8
Dim FSO, FileToRead

 FSOReadFromTextFile = ""
 On Error GoTo FSOReadFromTextFileErr
 
 Set FSO = CreateObject("Scripting.FileSystemObject")
 Set FileToRead = FSO.OpenTextFile(strFile, ForReading)

 FSOReadFromTextFile = FileToRead.ReadAll
 FileToRead.Close
FSOReadFromTextFileErr:
End Function

There may be a simpler way to achieve exactly the same result. I just did not see this resolved anywhere else by using purely VBA in Excel. So perhaps others with more experience can improve even more this answer. Hopefully that will be the case.

You are welcome!

rellampec
  • 698
  • 6
  • 22