4

Context:

I am implementing a logging mechanism for a Web API project that writes serialized objects to a file from multiple methods which in turn is read by an external process (nxLog to be more accurate). The application is hosted on IIS and uses 18 worker processes. The App pool is recycled once a day. The expected load on the services that will incorporate the logging methods is 10,000 req/s. In short this is a classic produces/consumer problem with multiple producers (the methods that produce logs) and one consumer (the external process who reads from the log files). Update: Each process uses multiple threads as well.

I used BlockingCollection to store data (and solve the race condition) and a long running task that writes the data from the collection to the disk.

To write to the disk I am using a StreamWriter and a FileStream.
Because the write frequency is almost constant ( as I said 10,000 write per second) I decided to keep the streams open for the entire lifetime of the application pool and periodically write logs to the disk. I rely on the App Pool recycle and my DI framework to dispose my logger daily. Also note that this class will be singleton, because I didn't want to have more than one thread dedicated to writing from my thread pool.

Apparently the FileStream object will not write to the disk until it is disposed. Now I don't want the FileStream to wait for an entire day until it writes to the disk. The memory it will require to hold all that serialized object will be tremendous, not to mention that any crash on the application or the server will cause data loss or corrupted file.

Now my question:

How can I have the underlying streams (FileStream and StreamWriter) write to the disk periodically without disposing them? My initial assumption was that it will write to the disk once FileSteam exceeds its buffer size which is 4K by default.

UPDATE: The inconsistencies mentioned in the answer have been fixed.

Code:

public class EventLogger: IDisposable, ILogger
{
    private readonly BlockingCollection<List<string>> _queue;
    private readonly Task _consumerTask;
    private FileStream _fs;
    private StreamWriter _sw;
    public EventLogger()
    {            
        OpenFile();
        _queue = new BlockingCollection<List<string>>(50);
        _consumerTask = Task.Factory.StartNew(Write, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
    }
    private void OpenFile()
    {
        _fs?.Dispose();
        _sw?.Dispose();            
        _logFilePath = $"D:\Log\log{DateTime.Now.ToString(yyyyMMdd)}{System.Diagnostic.Process.GetCurrentProcess().Id}.txt";
        _fs = new FileStream(_logFilePath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
        _sw = new StreamWriter(_fs);
    }
    public void Dispose()
    {            
         _queue?.CompleteAdding();
         _consumerTask?.Wait();            
         _sw?.Dispose();
         _fs?.Dispose();
         _queue?.Dispose();            

    }
    public void Log(List<string> list)
    {
        try
        {               
            _queue.TryAdd(list, 100);               

        }
        catch (Exception e)
        {
            LogError(LogLevel.Error, e);
        }
    }
    private void Write()
    {
        foreach (List<string> items in _queue.GetConsumingEnumerable())
        {               
            items.ForEach(item =>
            {                    
                _sw?.WriteLine(item);                    
            });
        }

    }
}
Ben
  • 538
  • 1
  • 9
  • 24
  • 1
    http://stackoverflow.com/questions/1059142/do-i-need-to-do-streamwriter-flush, https://msdn.microsoft.com/en-us/library/system.io.streamwriter.autoflush(v=vs.110).aspx – CodeCaster Dec 01 '16 at 19:20
  • Am I missing something here? Doesn't FileStream implement a flush method that does exactly this? Or are you saying this particular steam throws when you try and flush? – Charleh Dec 01 '16 at 19:22
  • @Charleh You are right. It does the job. It clears the buffer and writes to the disk. The thing is, I don't want to clear the buffer and write to the disk with every write. I still want to buffer data up to 4K and then writes to the disk "periodically". – Ben Dec 01 '16 at 19:37
  • @CodeCaster With the AutoFlush set to true, the StreamWriter flushes the buffer to its underlying stream which is a FileStream in my case. But the FileStream is still holding on to the data without writing it to the disk! – Ben Dec 01 '16 at 19:39
  • 1
    Why not just have another thread that checks the length of the buffer periodically and flushes when the buffer is full enough? You've done all the difficult threading stuff, now you just need a lock to prevent writes during the flush. I'm not the best person to ask when it comes to multi threading but I assume you can just read the buffer length without the lock but I'm imagining file stream is not thread safe – Charleh Dec 01 '16 at 20:11
  • That's in fact not a bad idea. This is what I ended up doing as it was suggested by others. As for thread safety, you are right, I need a lock for the stream writer. – Ben Dec 02 '16 at 20:02

3 Answers3

3

There are a few "inconsistencies" with your question.

The application is hosted on IIS and uses 18 worker processes

.

_logFilePath = $"D:\Log\log{DateTime.Now.ToString(yyyyMMdd)}{System.Diagnostic.Process.GetCurrentProcess().Id}.txt";

.

writes serialized objects to a file from multiple methods

Putting all of this together, you seem to have a single threaded situation as opposed to a multi-threaded one. And since there is a separate log per process, there is no contention problem or need for synchronization. What I mean to say is, I don't see why the BlockingCollection is needed at all. It's possible that you forgot to mention that there are multiple threads within your web process. I will make that assumption here.

Another problems is that your code does not compile

  1. class name is Logger but the EventLogger function looks like a constructor.
  2. some more incorrect syntax with string, etc

Putting all that aside, if you really have a contention situation and want to write to the same log via multiple threads or processes, your class seems to have most of what you need. I have modified your class to do some more things. Chief to note are the below items

  1. Fixed all the syntax errors making assumptions
  2. Added a timer, which will call the flush periodically. This will need a lock object so as to not interrupt the write operation
  3. Used an explicit buffer size in the StreamWriter constructor. You should heuristically determine what size works best for you. Also, you should disable AutoFlush from StreamWriter so you can have your writes hit the buffer instead of the file, providing better performance.

Below is the code with the changes

public class EventLogger : IDisposable, ILogger {
    private readonly BlockingCollection<List<string>> _queue;
    private readonly Task _consumerTask;
    private FileStream _fs;
    private StreamWriter _sw;
    private System.Timers.Timer _timer;
    private object streamLock = new object();

    private const int MAX_BUFFER = 16 * 1024;      // 16K
    private const int FLUSH_INTERVAL = 10 * 1000;  // 10 seconds

    public  EventLogger() {
        OpenFile();
        _queue = new BlockingCollection<List<string>>(50);
        _consumerTask = Task.Factory.StartNew(Write, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);

    }

    void SetupFlushTimer() {
        _timer = new System.Timers.Timer(FLUSH_INTERVAL);
        _timer.AutoReset = true;
        _timer.Elapsed += TimedFlush;
    }

    void TimedFlush(Object source, System.Timers.ElapsedEventArgs e) {
        _sw?.Flush();
    }

    private void OpenFile() {
        _fs?.Dispose();
        _sw?.Dispose();
        var _logFilePath = $"D:\\Log\\log{DateTime.Now.ToString("yyyyMMdd")}{System.Diagnostics.Process.GetCurrentProcess().Id}.txt";
        _fs = new FileStream(_logFilePath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
        _sw = new StreamWriter(_fs, Encoding.Default, MAX_BUFFER); // TODO: use the correct encoding here
        _sw.AutoFlush = false;
    }

    public void Dispose() {
        _timer.Elapsed -= TimedFlush;
        _timer.Dispose();

        _queue?.CompleteAdding();
        _consumerTask?.Wait();
        _sw?.Dispose();
        _fs?.Dispose();
        _queue?.Dispose();

    }
    public void Log(List<string> list) {
        try {
            _queue.TryAdd(list, 100);

        } catch (Exception e) {
            LogError(LogLevel.Error, e);
        }
    }

    private void Write() {
        foreach (List<string> items in _queue.GetConsumingEnumerable()) {
            lock (streamLock) {
                items.ForEach(item => {
                    _sw?.WriteLine(item);
                });
            }
        }

    }
}

EDIT:
There are 4 factors controlling the performance of this mechanism, and it is important to understand their relationship. Below example will hopefully make it clear

Let's say

  • average size of List<string> is 50 Bytes
  • Calls/sec is 10,000
  • MAX_BUFFER is 1024 * 1024 Bytes (1 Meg)

You are producing 500,000 Bytes of data per second, so a 1 Meg buffer can hold only 2 seconds worth of data. i.e. Even if FLUSH_INTERVAL is set to 10 seconds the buffer will AutoFlush every 2 seconds (on an average) when it runs out of buffer space.

Also remember that increasing the MAX_BUFFER blindly will not help, since the actual flush operation will take longer due to the bigger buffer size.

The main thing to understand is that when there is a difference in incoming data rates (to your EventLog class) and outgoing data rates (to the disk), you will either need an infinite sized buffer (assuming continuously running process) or you will have to slow down your avg. incoming rate to match avg. outgoing rate

Vikhram
  • 4,294
  • 1
  • 20
  • 32
  • Thanks Vikhram for the suggestion! Your asumptions were right: I have multiple threads per process. I'm going to update my question. As for the syntax error, it's a failed attempt to simplify my source code. You were right! – Ben Dec 02 '16 at 19:55
  • As for the solution of using a timer, I think that's a good idea. I am going to give it a try. – Ben Dec 02 '16 at 19:57
2

Maybe my answer won't address your concrete concern, but I believe that your scenario could be a good use case for memory-mapped files.

Persisted files are memory-mapped files that are associated with a source file on a disk. When the last process has finished working with the file, the data is saved to the source file on the disk. These memory-mapped files are suitable for working with extremely large source files.

This could be very interesting because you'll be able to do logging from different processes (i.e. IIS worker processes) without locking issues. See MemoryMappedFile.OpenExisting method.

Also, you can log to a non-persistent shared memory-mapped file and, using a task scheduler or a Windows service, you can take pending logs to their final destination using a persistable memory-mapped file.

I see a lot of potential on using this approach because of your multi/inter-process scenario.

Approach #2

If you don't want to re-invent the wheel, I would go for a reliable message queue like MSMQ (very basic, but still useful in your scenario) or RabbitMQ. Enqueue logs in persistent queues, and a background process may consume these log queues to write logs to the file system.

This way, you can create log files once, twice a day, or whenever you want, and you're not tied to the file system when logging actions within your system.

Matías Fidemraizer
  • 63,804
  • 18
  • 124
  • 206
  • Thanks Matias for the suggestions. I like the first solution. I was not aware of memory-mapped files, I'm gonna give it a try. Although my files are not supposed to be that large.The second approach is very interesting too. We are already using a queue like system called Hangfire. Not quite the same, but it might not be a bad idea to use it. – Ben Dec 02 '16 at 19:54
1

Use the FileStream.Flush() method - you might do this after each call to .Write. It will clear buffers for the stream and causes any buffered data to be written to the file.

https://msdn.microsoft.com/en-us/library/2bw4h516(v=vs.110).aspx

Xavier J
  • 4,326
  • 1
  • 14
  • 25
  • 3
    Flushing after ever write may have performance issues, especially if it is writing 10,000 entries per second. This is the exact reason that the streams don't write directly to disk, and instead buffer data to memory first; it's a slow operation. A better choice might be to periodically flush the buffer every minute using timer, or to flush after every x entires. – Bradley Uffner Dec 02 '16 at 14:14
  • Exactly @BradleyUffner! – Ben Dec 02 '16 at 19:46
  • 1
    @BradleyUffner That's a possibility, but then maybe not. The actual writes to the storage media are subject to the hardware and drivers installed on the system. The hardware itself may have a huge buffer. The OP's goal was to get the data out of the *application*. Once it's handed over to the OS, performance might still need to be tuned, whether flushes are put on a timer, or not. – Xavier J Dec 02 '16 at 19:50