0

I'm building an import functionality for a program that enables a user to import Excel data, Text files (with any delimiter) and .csv files (with any delimiter).

I have set up the Opendialog and a way to determine and limit the data source, but now I want to inspect the first line (or couple of lines of text) in the .csv or .txt file to determine which character repeats regularly across the first 5 lines (most probably the delimiter) and open this file already split into columns by that delimiter.

`Dim delimiter as string

'Determine delimiter...

'For .txt files: 
Workbooks.OpenText Filename:=FileName:=FullPath, DataType:=xlDelimited, Other:=True, OtherChar:=delimiter

'For .csv files:
Workbooks.Open FileName:=FullPath, Format:=6, Delimiter:=delimiter`

I would like to determine the most probable delimiter based on the character that repeats the most across the first few lines of text in the file.

  • You cannot look into a file without opening the file. That's like you cannot look into a box without opening the box (x-ray for files does not exist). – Pᴇʜ May 17 '19 at 10:13
  • 1
    Without giving a solution, if you want to read from the file, you have to open it. I suggest building a function that looks for specific characters you are interested in (i.e. comma, tabs or whatever you deem as delimiter). See [Read/Parse text file line by line in VBA](https://stackoverflow.com/questions/11528694/read-parse-text-file-line-by-line-in-vba). – FAB May 17 '19 at 10:14
  • Okay. First prize would be without opening it, but I don't mind opening it and closing it again before import as well. – Renier Wessels May 17 '19 at 10:16
  • @RenierWessels Try what DarXyde suggested, and if you get stuck or errors come back with the code you tried. • First prize is impossible. – Pᴇʜ May 17 '19 at 10:18
  • 2
    Where you say "without opening the file" - do you actually mean "without fully loading the file into Excel"? If so, perhaps reword your question/title. – CLR May 17 '19 at 11:49

1 Answers1

1

Here is some code that will likely do what you need:

Sub determine_delim()
    Dim filetocheck As String, firstline As String
    Dim ff As Long
    filetocheck = "full path and name of file here"
    ff = FreeFile

    Open filetocheck For Input As ff
        Line Input #ff, firstline
    Close ff

    delimiter = most_popular(firstline)

End Sub

Function most_popular(str As String)

    Dim pieces As Variant
    Dim cnt As Long, ch as Long
    Dim minCount As Long
    Dim possibles As String

    possibles = "|¦,;" & Chr(9) ' Chr(9)=Tab

    For ch = 1 To Len(possibles)

        pieces = Split(str, Mid(possibles, ch, 1))
        cnt = UBound(pieces)

        If minCount < cnt Then
            minCount = cnt
            most_popular = Mid(possibles, ch, 1)
        End If

    Next ch

End Function

Rather than just find the most popular character in the first line - the code looks for the most popular character in the line that is also in possibles - which I have preloaded with the most common delimiters you find in flat files - comma, pipe, broken pipe and tab. This prevents things like underscores or spaces (commonly found in header rows) from being included.

CLR
  • 11,284
  • 1
  • 11
  • 29
  • semicolon `;` is used in CSV files generated by Excel (in an OS with Region settings that specify the list separator other than comma, it's usually a semicolon) – Aprillion May 17 '19 at 12:18
  • Semicolon added to `possibles` - as per @Aprillion suggestion. – CLR May 17 '19 at 13:56