1

I would like to know a way (of course I want to know the best way) to edit a csv file, preferably without having to read the original and write the results to a new one.

Example 1: I want to split the contents of a column into two columns 123abc into 123, abc

Example 2: I want to remove letters from the beginning of a column AB123 into 123

Clarification: I do not need help solving my examples; I just need to be pointed in the right direction of reading and editing the same file.

Here is a code example of using a temporary file that seems to me is too slow

    Dim currentLine as string()
tempFile.AutoFlush = True

Do Until origFile.EndOfData
    currentLine = origFile.ReadFields
    currentLine(1) = currentLine(1).Replace("ABC","") 'would convert ABC123 to 123
    For index as Integer = 0 to currentLine.Count - 2
        tempFile.Write(currentLine(index) & ",")
    Next
        tempFile.Write(currentLine(currentLine.Count - 1))
        tempFile.WriteLine()
Loop

tempFile.Flush()
tempFile.Dispose()
origFile.Dispose()
IO.File.Copy(tempFile,OrigFile,True)
IO.File.Delete(tempFile)
donbyte
  • 253
  • 2
  • 6
  • 13

2 Answers2

4

You really will be best off writing a new file. You can write to a temporary file, delete the old file and rename the new file as the old file.

Something like:

Imports System.IO

Module Module1

    Sub Main()
        Dim tempfile = Path.GetTempFileName()
        Dim inputFile = "C:\Temp\input.txt"

        Using sw = New StreamWriter(tempfile)
            Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(inputFile)
                MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
                MyReader.Delimiters = New String() {","}
                Dim currentRow As String()
                While Not MyReader.EndOfData
                    Try
                        currentRow = MyReader.ReadFields()
                        If currentRow.Count >= 2 Then
                            currentRow(1) = currentRow(1).Replace("ABC", "")
                        End If
                        sw.WriteLine(String.Join(",", currentRow))
                    Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                        MsgBox("Line " & ex.Message & " is invalid.  Skipping")
                    End Try
                End While
            End Using
        End Using

        File.Delete(inputFile)
        File.Move(tempfile, inputFile)

    End Sub

End Module

(Basically copying the TextFieldParser example from http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.readfields.aspx )

Andrew Morton
  • 24,203
  • 9
  • 60
  • 84
  • 1
    Alternately he could read the old file into memory, modify it, then write back out to the same file – briddums Jun 15 '12 at 18:51
  • Using an intermediary temporary file does have the advantage that if something goes wrong, you don't lose the data. Also, if the file is too large to fit in memory, it can be read line-by-line for processing. – Andrew Morton Jun 15 '12 at 19:56
  • Thank you. Using a temporary file seems too slow. Code example: Dim currentLine as string() tempFile.AutoFlush = True Do Until origFile.EndOfData currentLine = origFile.ReadFields currentLine(1) = currentLine(1).Replace("ABC","") 'would convert ABC123 to 123 For index as Integer = 0 to currentLine.Count - 2 tempFile.Write(currentLine(index) & ",") Next tempFile.Write(currentLine(currentLine.Count - 1)) tempFile.WriteLine() Loop tempFile.Flush() tempFile.Dispose() origFile.Dispose() IO.File.Copy(tempFile,OrigFile,True) IO.File.Delete(tempFile)` – donbyte Jun 15 '12 at 20:16
  • Very new to this site . . . How do I respond properly and especially with snippets of code? – donbyte Jun 15 '12 at 20:18
  • 1
    @donbyte If you add the code as an edit to your original post, we will be able to see it better. – Andrew Morton Jun 15 '12 at 20:19
  • Using a temporary file seems to slow? What type of processor are you using, and how large is the .csv file? – briddums Jun 15 '12 at 20:20
  • @AndrewMorton Thank you. I am new to the site and did not know how best to add a bit of code. – donbyte Jun 15 '12 at 20:24
  • @briddums I could get any size csv with I guess an upper range of a couple million. The current file is about 250,000. It took maybe 2 minutes but I felt that was pretty slow. – donbyte Jun 15 '12 at 20:25
  • @AndrewMorton Thank you very much. This is the first time I have used a forum for programming help and my first experience has been great. The key for speeding up my code was to add the string.join instead of my for loop! It was at least 4X faster! Thank you. – donbyte Jun 15 '12 at 20:43
0

You can use ADO to connect to the CSV file. See here for more info:

How to read a CSV file into a .NET Datatable

However, I would recommend simply loading it into an array and then writing it back out with simple file IO methods. It's just a lot less code and overhead:

For Each row As String In File.ReadAllLines("C:\test.csv")
    For Each column As String In row.Split(New String() {","}, StringSplitOptions.None)
        ' Do work
    Next
Next

And then something like this to save the file:

File.WriteAllLines(rows)
Community
  • 1
  • 1
Steven Doggart
  • 43,358
  • 8
  • 68
  • 105