-1

I am looking for some input and possible example for parsing a text file with the following format: (sorry not sure how to retain the formatting of the file in this text)

NAME          ID         FORMAT         SHORT NAME  
    DESCRIPTION (this field is on the second row an indented by 5 spaces)

The first row (NAME, ID, FORMAT and SHORT NAME) always consist of just one row. The DESCRIPTION text may span multiple rows. In some cases, there is only a first row of NAME, ID, etc. without a corresponding DESCRIPTION row.

Here is an example of how the data looks in the file now:

NAME          ID         FORMAT         SHORT NAME  
    DESCRIPTION
ABC          01          xx           AB
    abcdefg
    hijklm
    nopqrs
DEF          02          xx           DE
    abcedfg
    hijklmnopqrst
GHI          03          xx.x         GH
JKL          001         xx           JKL
    abcdef
    ghijk
    lmnopq
    rstu
    vwxyz

I would like to parse out the NAME, ID, FORMAT, SHORT NAME and DESCRIPTION into 5 separate columns in a csv or excel file for additional analysis. I don't care if the DESCRIPTION field is broken across multiple lines but it can also be concatenated into a single longer string.

Hope this all makes sense. Thanks in advance!

  • What have you tried so far and where is the problem, exactly? – cybernetic.nomad Mar 02 '20 at 17:47
  • I have tried importing with excel, but the description lines are on multiple lines after the first row and don't import as part of the first row. I was unable to show the formatting in my post, but effectively a record consists of the first row (NAME, ID, FORMAT, SHORT NAME and DESCRIPTION) but the indented DESCRIPTION may span multiple rows. Then the next record begins again with NAME, ID, FORMAT, SHORT NAME followed by DESCRIPTION on the next line or lines. – user2399430 Mar 02 '20 at 17:59
  • Please [edit your question](https://stackoverflow.com/posts/60494055/edit) to provide us with what you have so far and the expected outcome. I believe the edit I did formatted the text close to what it is like, but it has been removed, making the question less clear than it was before. I think using a good text editor to add the required number of tabs at the start of the `Description` lines may be the way to go here. – cybernetic.nomad Mar 02 '20 at 18:06
  • Does this answer your question? [Importing CSV with line breaks in Excel 2007](https://stackoverflow.com/questions/2668678/importing-csv-with-line-breaks-in-excel-2007) – HackSlash Mar 02 '20 at 18:59
  • Thanks I have the correct formatting of the file now. This should help with the description of the problem. – user2399430 Mar 02 '20 at 19:15
  • Do the NAME, ID, FORMAT, SHORT NAME values always left aligned with their header words ? They are 1 or 2 characters out on your example. – CDP1802 Mar 02 '20 at 20:42

1 Answers1

0

Providing the data for NAME,ID,FORMAT and SHORT NAME is aligned beneath their header word then use those words on the first line to calculate the start position and length of each field, then split the lines into fields using Mid(). Join the description lines and write out to the previous record before a new record is started. For example

Option Explicit
Sub ParseTextFile()

    Const INFILE = "c:\temp\testfile.txt"
    Const OUTFILE = "c:\temp\testfile.xlsx"

    Dim wbOut As Workbook, ws As Worksheet, iRow As Long
    Dim txt As String, ff As Integer, i As Integer, desc As String
    Dim start(4) As Integer, length(4) As Integer
    Dim count As Integer, msg As String

    Set wbOut = Workbooks.Add
    Set ws = wbOut.Sheets("Sheet1")
    ws.Range("A1:E1") = Array("NAME", "ID", "FORMAT", "SHORT NAME", "DESCRIPTION")
    ws.Columns("A:E").NumberFormat = "@"
    iRow = 1

    ff = FreeFile()
    Open INFILE For Input As #ff

    While Not EOF(ff)
        count = count + 1
        Line Input #ff, txt
        If count = 1 Then
            start(1) = InStr(1, txt, "NAME", vbTextCompare)
            start(2) = InStr(1, txt, "ID", vbTextCompare)
            start(3) = InStr(1, txt, "FORMAT", vbTextCompare)
            start(4) = InStr(1, txt, "SHORT NAME", vbTextCompare)
            For i = 1 To 3
                length(i) = start(i + 1) - start(i)
            Next
        Else
            If Left(txt, 1) = " " Then
                desc = desc & Trim(txt) & " "
            Else
                ' save the description from last record
                ws.Cells(iRow, 5) = Trim(desc)
                desc = ""

                ' new row
                iRow = iRow + 1
                length(4) = Len(txt) - start(4) + 1
                For i = 1 To 4
                    ws.Cells(iRow, i) = Mid(txt, start(i), length(i))
                Next
            End If
        End If
    Wend
    Close #ff

    ' final description
    ws.Cells(iRow, 5) = Trim(desc)

    ' save result
    ws.Columns("A:E").AutoFit
    wbOut.Close True, OUTFILE

    msg = count & " lines read from " & INFILE & vbCr & _
          iRow - 1 & " rows written to " & OUTFILE

    MsgBox msg, vbInformation
End Sub
ComputerVersteher
  • 2,638
  • 1
  • 10
  • 20
CDP1802
  • 13,871
  • 2
  • 7
  • 17
  • Thank you so much for this example. It is generating an error inside the loop at the ws.Cells(iRow, i) = Mid(txt, start(i), length(i)) call. The start(i) and length(i) are = 0 at this point. I am using mac excel so not sure if this is the problem. It seem like the start and length variable are getting reset? Thanks again! – user2399430 Mar 03 '20 at 12:07
  • @ Does it work for the example you posted. What is the error message ? The header must be line 1, that is the only place the start variable is set. It might be the CR/LF line endings on the text file. Is the text file from windows ?. Is it ASCII or some Unicode UTF-8 ? – CDP1802 Mar 03 '20 at 12:20
  • The file I had didn't have the headers. Mostly there now, the name is splitting across the ID and FORMAT fields so I just need to adjust the field lengths. Thanks again this is great!!! – user2399430 Mar 03 '20 at 12:34