0

I coded a simple program that reads from a Textfile Line by Line and If the current readed Line has alphabetics (a-z A-Z) it will write that Line into an other txt file.

If the current readed line doesn't have alphabetics it wont write that line into a new text file.

I created this for the purpose that I have members registering at my website and some of them are using only numbers as Username. I will filter them out and only save the alphabetic Names. (Focus on this Project please I know i could just use php stuff)

That works great already but it takes a while to read line by line and write into the other text file (Write speed 150kb in 1 Minute - Its not my drive I have a fast ssd).

So I wonder if there is a faster way. I could "readalllines" first but on large files it just freezes my program so I don't know if that works too (I want to focus on large +1gb files)

This is my code so far:

 If System.IO.File.Exists(FILE_NAME) = True Then

            Dim objReader As New System.IO.StreamReader(FILE_NAME)

            Do While objReader.Peek() <> -1

                Dim myFile As New FileInfo(output)
                Dim sizeInBytes As Long = myFile.Length

                If sizeInBytes > splitvalue Then
                    outcount += 1
                    output = outputold + outcount.ToString + ".txt"
                    File.Create(output).Dispose()
                End If

                count += 1
                TextLine = objReader.ReadLine() & vbNewLine
                Console.WriteLine(TextLine)


                If CheckForAlphaCharacters(TextLine) Then
                    File.AppendAllText(output, TextLine)
                Else
                    found += 1
                    Label2.Text = "Removed: " + found.ToString
                    TextBox1.Text = TextLine
                End If

                Label1.Text = "Checked: " + count.ToString

            Loop

            MessageBox.Show("Finish!")

        End If

  • Instead of asking about file I/O, you'll want to investigate threading models, to do all the file I/O on a background thread. (Perhaps [async/await](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/) is simple enough for this.) Also, your program 'freezes' because you are mixing your business logic with UI manipulation (e.g. `Label2.Text = `). By running this long operation on the UI thread, you make the UI unresponsive. There are all sorts of models for how to achieve this separation ([MVVM](https://en.wikipedia.org/wiki/Model–view–viewmodel) is one). – Sean Skelly Oct 26 '20 at 18:31
  • Alright, just for a test purpose i removed everything also the ui part and only left the text writing part. it looks like the speed went up from 150kb in 1 minute to 200kb in 1 minute which is still really slow. reading line by line and pasting that into another text file seems like thats the issue. Do you know if there is a faster method? – Baseult Private Oct 26 '20 at 18:38
  • Well, file I/O takes time. Here's [another question](https://stackoverflow.com/q/4273699/3791245) on the same topic, and the answers there suggest you're not going to get much better performance than `StreamReader.ReadLine()`. The answers I suggest looking at closely talk about using separate threads for reading and writing. (Think producer/consumer model.) That way your speed limiter (likely `ReadLine()`) isn't blocked by anything. – Sean Skelly Oct 26 '20 at 19:04
  • Another consideration, CheckForAlphaCharacters(TextLine) how much of a delay is that introducing? is there something in the way that method is implemented that can be optimized? – Hursey Oct 26 '20 at 21:14
  • 1
    What sort of size is `splitvalue`? It is possible that you could accumulate the data into a StringBuilder and then write that in one go to one of the output files. – Andrew Morton Oct 27 '20 at 09:42

2 Answers2

1

First of all, as hinted by @Sean Skelly updating UI controls - repeatedly - is an expensive operation. But your bigger problem is File.AppendAllText:

            If CheckForAlphaCharacters(TextLine) Then
                File.AppendAllText(output, TextLine)
            Else
                found += 1
                Label2.Text = "Removed: " + found.ToString
                TextBox1.Text = TextLine
            End If

AppendAllText(String, String)

Opens a file, appends the specified string to the file, and then closes the file. If the file does not exist, this method creates a file, writes the specified string to the file, then closes the file. Source

You are repeatedly opening and closing a file, causing overhead. AppendAllText is a convenience method since it performs several operations in one single call but you can now see why it's not performing well in a big loop.

The fix is easy. Open the file once when you start your loop and close it at the end. Make sure that you always close the file properly even when an exception occurs. For that, you can either invoke the Close in a Finally block, or use a context manager, that is keep your file write operations within a Using block.

And you could remove the print to console as well. Display management has a cost too. Or you could print status updates every 10K lines or so.

When you've done all that, you should notice improved performance.

Kate
  • 1,809
  • 1
  • 8
  • 7
0

My Final Code - It works a lot faster now (500mbs in 1 minute)

 Using sw As StreamWriter = File.CreateText(output)
            For Each oneLine As String In File.ReadLines(FILE_NAME)
                Try
                    If changeme = True Then
                        changeme = False
                        GoTo Again2
                    End If

                    If oneLine.Contains(":") Then
                        Dim TestString = oneLine.Substring(0, oneLine.IndexOf(":")).Trim()
                        Dim TestString2 = oneLine.Substring(oneLine.IndexOf(":")).Trim()
                        If CheckForAlphaCharacters(TestString) = False And CheckForAlphaCharacters(TestString2) = False Then
                            sw.WriteLine(oneLine)
                        Else
                            found += 1
                        End If

                    ElseIf oneLine.Contains(";") Or oneLine.Contains("|") Or oneLine.Contains(" ") Then
                        Dim oneLineReplac As String = oneLine.Replace(" ", ":")
                        Dim oneLineReplace As String = oneLineReplac.Replace("|", ":")
                        Dim oneLineReplaced As String = oneLineReplace.Replace(";", ":")
                        If oneLineReplaced.Contains(":") Then
                            Dim TestString3 = oneLineReplaced.Substring(0, oneLineReplaced.IndexOf(":")).Trim()
                            Dim TestString4 = oneLineReplaced.Substring(oneLineReplaced.IndexOf(":")).Trim()
                            If CheckForAlphaCharacters(TestString3) = False And CheckForAlphaCharacters(TestString4) = False Then
                                sw.WriteLine(oneLineReplaced)
                            Else
                                found += 1
                            End If
                        Else
                            errors += 1
                            textstring = oneLine
                        End If
                    Else
                        errors += 1
                        textstring = oneLine
                    End If
                    count += 1
                Catch
                    errors += 1
                    textstring = oneLine
                End Try
            Next


        End Using