4

I would like to ask help on my code. I am a newbie and wanted to implement safe multi threading in writing to a text file.

    StreamWriter sw = new StreamWriter(@"C:\DailyLog.txt");
    private void Update(){
        var collection = Database.GetCollection<Entity>("products");
        StreamReader sr = new StreamReader(@"C:\LUSTK.txt");
            string[] line = sr.ReadLine().Split(new char[] { ';' });
            while (!sr.EndOfStream)
            {
                    line = sr.ReadLine().Split(new char[] { ';' });
                    t = delegate { 
                           UpdateEach(Convert.ToInt32(line[5])); 
                        }; 
                     new Thread(t).Start();
            }
            sr.Close();

    }
    private void UpdateEach(int stock)
    {
            sw.WriteLine(ean);

    }

I got no error on my code but it seems not all written to my text file. I did not make sw.Close() because i know some threads were not finish yet. In addition, how can i implement sw.Close knowing that no thread left unfinish. I have 5 milion records in my LUSTK.text that is to be read by StreamReader and each created a thread and each thread access same text file.

Wylan Osorio
  • 1,136
  • 5
  • 19
  • 46

2 Answers2

8

You aren't going to be able to concurrently write to the same writer from different threads. The object wasn't designed to support concurrent access.

Beyond that, the general idea of writing to the same file from multiple threads is flawed. You still only have one physical disk, and it can only spin so fast. Telling it to do things faster won't make it spin any faster.

Beyond that, you're not closing the writer, as you said, and as a result, the buffer isn't being flushed.

You also have a bug in that your anonymous method is closing over line, and all of the methods are closing over the same variable, which is changing. It's important that they each close over their own identifier that won't change. (This can be accomplished simply by declaring line inside of the while loop.) But since you shouldn't be using multiple threads to begin with, there's no real need to focus on this.

You can also use File.ReadLines and File.WriteAllLines to do your file IO; it results in much cleaner code:

var values = File.ReadLines(inputFile)
    .Select(line => line.Split(';')[5]);
File.WriteAllLines(outputFile, values);

If you were to want to parallelize this process it would be because you're doing some CPU bound work on each item after you read the line and before you write the line. Parallelizing the actual file IO, as said before, is likely to be harmful, not helpful. In this case the CPU bound work is just splitting the line and grabbing one value, and that's likely to be amazingly fast compared to the file IO. If you needed to, for example, hit the database or do some expensive processing on each line, then you would consider parallelizing just that part of the work, while synchronizing the file IO through a single thread.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • Thanks, and if so instead using the StreamWriter, can I use an array list to store those `ean`? And after all threads finish, i will just loop through the array list and do the StreamWriter. – Wylan Osorio Oct 10 '13 at 19:22
  • @Wao Perhaps, if you have enough memory to hold onto that much data. You might not though. The code that I showed is still streaming the data and processing one line at a time; it's not pulling all of the lines into memory. With a file that large it's likely an important thing to do. – Servy Oct 10 '13 at 19:23
  • @Wao As I said, there's really nothing to gain here from using multiple threads. Just don't. – Servy Oct 10 '13 at 19:26
6

A StreamWriter is simply not thread-safe; you would need to synchronize access to this via lock or similar. However, I would advise rethinking your strategy generally:

  • starting lots of threads is a really bad idea - threads are actually pretty expensive, and should not be used for small items of work (a Task or the ThreadPool might be fine, though) - a low number of threads perhaps separately dequeuing from a thread-safe queue would be preferable
  • you will have no guarantee of order in terms of the output
  • frankly, I would expect IO to be your biggest performance issue here, and that isn't impacted by the number of threads (or worse: can be adversely impacted)
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Thanks for the info. But if i use queue it will take me a time to finish the 5 million records right? – Wylan Osorio Oct 10 '13 at 19:20
  • @Wao Yes, processing a 5 million line file by both reading in that much data and then writing it out again is going to take some time. That is inevitable, parallelizing it won't help. About the only thing that would is buying a disk drive with faster read/write speeds, i.e. a solid state drive. That, or getting less data. – Servy Oct 10 '13 at 19:22
  • @Marc Gravell, can you look on that question please? http://stackoverflow.com/questions/19270449/can-i-use-interlocked-increment-inside-lock-statement – hackp0int Oct 10 '13 at 20:59