Displaying large text files with WPF C#

Question

I'm trying to write a WPF application to display (possibly) large log files (50MB-2GB) such that they are easier to read. I tried loading a 5 MB file with ~75k lines into a GridView with TextBlocks but it was really slow. I don't need any editing capabilities.

I came across GlyphRun but I couldn't figure out how to use them. I imagine I would have to fill a canvas or image with a GlyphRun of each line of my log file. Could anyone tell me how to do this? The documentation on GlyphRun is not very helpful unfortunately.

Do you need them line by line? Can you load them into a single textblock/richtextblock? You will probably have to read sections of the file. Loading it all at once is not practical. — the.Doc, Oct 11 '21 at 10:54
@the.Doc Yes, I do need them line by line. I thought about something like having a List of GlyphRuns (one for each line) and displaying the lines that fit inside the window. But right now I can't even make my App show any GlyphRun I stored in a List because I don't know how to bind to the List in XAML (if that's possible). — Rouen, Oct 11 '21 at 11:43
It's not wise to have such big log files. It's really really useless and pain to work with. You should use file rollover and create files of a size of few MB. Every logging framework supports file rollover (both, by size or time). There are other solutions like ElasticSearch or Seq in order to index and search/view logs. Visual Code is free and provides highlighting out of the box. If you only want to read then maybe you should use existing solutions. — BionicCode, Oct 12 '21 at 13:54
At work we use splunk. Serilog allows analysis of log data patterns. With nlog otherwise, i would use it's settings to limit file size and keep the last so many. If you really can't do any of that then there are free log viewers available . Or just notepad++. I wouldn't bother with glyphrun. — Andy, Oct 12 '21 at 18:15

BionicCode · Accepted Answer · 2021-10-12T20:49:03.237

I have this file reading algorithm from a proof of concept application (which was also a log file viewer/diff viewer). The implementation requires C# 8.0 (.NET Core 3.x or .NET 5). I removed some indexing, cancellation etc. to remove noise and to show the core business of the algorithm.
It performs quite fast and compares very well with editors like Visual Code. It can't get much faster. To keep the UI responsive I highly recommend to use UI virtualization. If you implement UI virtualization, then the bottleneck will be the file reading operation. You can tweak the algorithm's performance by using different partition sizes (you can implement some smart partitioning to calculate them dynamically).
The key parts of the algorithm are

asynchronous implementation of Producer-Consumer pattern using Channel
partitioning of the source file into blocks of n bytes
parallel processing of file partitions (concurrent file reading)
merging the result document blocks and overlapping lines

DocumentBlock.cs
The result struct that holds the lines of a processed file partition.

public readonly struct DocumentBlock
{
  public DocumentBlock(long rank, IList<string> content, bool hasOverflow)
  {
    this.Rank = rank;
    this.Content = content;
    this.HasOverflow = hasOverflow;
  }

  public long Rank { get; }
  public IList<string> Content { get; }
  public bool HasOverflow { get; }
}

ViewModel.cs
The entry point is the public ViewModel.ReadFileAsync member.

class ViewModel : INotifyPropertyChanged
{
  public ViewModel() => this.DocumentBlocks = new ConcurrentBag<DocumentBlock>();

  // TODO::Make reentrant 
  // (for example cancel running operations and 
  // lock/synchronize the method using a SemaphoreSlim)
  public async Task ReadFileAsync(string filePath)
  {
    using var cancellationTokenSource = new CancellationTokenSource();

    this.DocumentBlocks.Clear();    
    this.EndOfFileReached = false;

    // Create the channel (Producer-Consumer implementation)
    BoundedChannelOptions channeloptions = new BoundedChannelOptions(Environment.ProcessorCount)
    {
      FullMode = BoundedChannelFullMode.Wait,
      AllowSynchronousContinuations = false,
      SingleWriter = true
    };

    var channel = Channel.CreateBounded<(long PartitionLowerBound, long PartitionUpperBound)>(channeloptions);

    // Create consumer threads
    var tasks = new List<Task>();
    for (int threadIndex = 0; threadIndex < Environment.ProcessorCount; threadIndex++)
    {
      Task task = Task.Run(async () => await ConsumeFilePartitionsAsync(channel.Reader, filePath, cancellationTokenSource));
      tasks.Add(task);
    }

    // Produce document byte blocks
    await ProduceFilePartitionsAsync(channel.Writer, cancellationTokenSource.Token);    
    await Task.WhenAll(tasks);    
    CreateFileContent();
    this.DocumentBlocks.Clear();
  }

  private void CreateFileContent()
  {
    var document = new List<string>();
    string overflowingLineContent = string.Empty;
    bool isOverflowMergePending = false;

    var orderedDocumentBlocks = this.DocumentBlocks.OrderBy(documentBlock => documentBlock.Rank);
    foreach (var documentBlock in orderedDocumentBlocks)
    {
      if (isOverflowMergePending)
      {
        documentBlock.Content[0] += overflowingLineContent;
        isOverflowMergePending = false;
      }

      if (documentBlock.HasOverflow)
      {
        overflowingLineContent = documentBlock.Content.Last();
        documentBlock.Content.RemoveAt(documentBlock.Content.Count - 1);
        isOverflowMergePending = true;
      }

      document.AddRange(documentBlock.Content);
    }

    this.FileContent = new ObservableCollection<string>(document);
  }

  private async Task ProduceFilePartitionsAsync(
    ChannelWriter<(long PartitionLowerBound, long PartitionUpperBound)> channelWriter, 
    CancellationToken cancellationToken)
  {
    var iterationCount = 0;
    while (!this.EndOfFileReached)
    {
      try
      {
        var partition = (iterationCount++ * ViewModel.PartitionSizeInBytes,
          iterationCount * ViewModel.PartitionSizeInBytes);
        await channelWriter.WriteAsync(partition, cancellationToken);
      }
      catch (OperationCanceledException)
      {}
    }
    channelWriter.Complete();
  }

  private async Task ConsumeFilePartitionsAsync(
    ChannelReader<(long PartitionLowerBound, long PartitionUpperBound)> channelReader, 
    string filePath, 
    CancellationTokenSource waitingChannelWritertCancellationTokenSource)
  {
    await using var file = File.OpenRead(filePath);
    using var reader = new StreamReader(file);

    await foreach ((long PartitionLowerBound, long PartitionUpperBound) filePartitionInfo
      in channelReader.ReadAllAsync())
    {
      if (filePartitionInfo.PartitionLowerBound >= file.Length)
      {
        this.EndOfFileReached = true;
        waitingChannelWritertCancellationTokenSource.Cancel();
        return;
      }

      var documentBlockLines = new List<string>();
      file.Seek(filePartitionInfo.PartitionLowerBound, SeekOrigin.Begin);
      var filePartition = new byte[filePartitionInfo.PartitionUpperBound - partition.PartitionLowerBound];
      await file.ReadAsync(filePartition, 0, filePartition.Length);

      // Extract lines
      bool isLastLineComplete = ExtractLinesFromFilePartition(documentBlockLines, filePartition); 

      bool documentBlockHasOverflow = !isLastLineComplete && file.Position != file.Length;
      var documentBlock = new DocumentBlock(partition.PartitionLowerBound, documentBlockLines, documentBlockHasOverflow);
      this.DocumentBlocks.Add(documentBlock);
    }
  }  

  private bool ExtractLinesFromFilePartition(byte[] filePartition, List<string> resultDocumentBlockLines)
  {
    bool isLineFound = false;
    for (int bufferIndex = 0; bufferIndex < filePartition.Length; bufferIndex++)
    {
      isLineFound = false;
      int lineBeginIndex = bufferIndex;
      while (bufferIndex < filePartition.Length
        && !(isLineFound = ((char)filePartition[bufferIndex]).Equals('\n')))
      {
        bufferIndex++;
      }

      int lineByteCount = bufferIndex - lineBeginIndex;
      if (lineByteCount.Equals(0))
      {
        documentBlockLines.Add(string.Empty);
      }
      else
      {
        var lineBytes = new byte[lineByteCount];
        Array.Copy(filePartition, lineBeginIndex, lineBytes, 0, lineBytes.Length);
        string lineContent = Encoding.UTF8.GetString(lineBytes).Trim('\r');
        resultDocumentBlockLines.Add(lineContent);
      }
    }      

    return isLineFound;
  }

  protected virtual void OnPropertyChanged([CallerMemberName] string propertyName = "") 
    => this.PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName));

  public event PropertyChangedEventHandler PropertyChanged;
  private const long PartitionSizeInBytes = 100000;
  private bool EndOfFileReached { get; set; }
  private ConcurrentBag<DocumentBlock> DocumentBlocks { get; }

  private ObservableCollection<string> fileContent;
  public ObservableCollection<string> FileContent
  {
    get => this.fileContent;
    set
    {
      this.fileContent = value;
      OnPropertyChanged();
    }
  }
}

To implement a very simple UI virtualization, this example uses a plain ListBox, where all mouse effects are removed from the ListBoxItem elements in order to get rid of the ListBox look and feel (a indetermintae progress indicator is highly recommended). You can enhance the example to allow multi-line text selection (e.g., to allow to copy text to the clipboard).

MainWindow.xaml

<Window>
  <Window.DataContext>
    <ViewModel />
  </Window.DataContext>

  <ListBox ScrollViewer.VerticalScrollBarVisibility="Visible" 
           ItemsSource="{Binding FileContent}" 
           Height="400" >
    <ListBox.ItemContainerStyle>
      <Style TargetType="ListBoxItem">
        <Setter Property="Template">
          <Setter.Value>
            <ControlTemplate TargetType="ListBoxItem">
              <ContentPresenter />
            </ControlTemplate>
          </Setter.Value>
        </Setter>
      </Style>
    </ListBox.ItemContainerStyle>
  </ListBox>
</Window>

If you are more advanced, you can simply implement your own powerful document viewer e.g., by extending the VirtualizingPanel and using low-level text rendering. This allows you to increase performance in case you are interested in text search and highlighting (in this context stay far away from RichTextBox (or FlowDocument) as it is too slow).

At least you have a good performing text file reading algorithm you can use to generate the data source for your UI implementation.

If this viewer is not your main product, but a simple development tool to aid you in processing log files, I don't recommend to implement your own log file viewer. There are plenty of free and paid applications out there.

Thank you. This is very helpful for me. I was actually able to get a decent performance on my initial project as I realized that some custom style I applied turned the virtualization of my ListView off. Making use of the virtualization and filtering the log file to reduce it to essential content did the trick for me. Thank you for your time and effort to help me out. — Rouen, Oct 19 '21 at 14:12

Displaying large text files with WPF C#

1 Answers1

Linked