0

I am searching for a very fast way of loading text content from a 1GB text file into a WPF control (ListView for example). I want to load the content within 2 seconds.

Reading the content line by line takes to long, so I think reading it as bytes will be faster. So far I have:

byte[] buffer = new byte[4096];
int bytesRead = 0;
using(FileStream fs = new FileStream("myfile.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) {
    while((bytesRead = fs.Read(buffer, 0, buffer.Length)) > 0) {
        Encoding.Unicode.GetString(buffer);
    }
}

Is there any way of transforming the bytes into string lines and add those to a ListView/ListBox?

Is this the fastest way of loading file content into a WPF GUI control? There are various applications that can load file content from a 1GB file within 1 second.

EDIT: will it help by using multiple threads reading the file? For example:

var t1 = Task.Factory.StartNew(() =>
{
    //read content/load into GUI...
});

EDIT 2: I am planning to use pagination/paging as suggested below, but when I want to scroll down or up, the file content has to be read again to get to the place that is being displayed.. so I would like to use:

fs.Seek(bytePosition, SeekOrigin.Begin);

but would that be faster than reading line by line, in multiple threads? Example:

long fileLength = fs.Length;
long halfFile = (fileLength / 2);
FileStream fs2 = fs;
byte[] buffer2 = new byte[4096];
int bytesRead2 = 0;
var t1 = Task.Factory.StartNew(() =>
{
    while((bytesRead += fs.Read(buffer, 0, buffer.Length)) < (halfFile -1)) {
        Encoding.Unicode.GetString(buffer);
        //convert bytes into string lines...
    }
});

var t2 = Task.Factory.StartNew(() =>
{
    fs2.Seek(halfFile, SeekOrigin.Begin);
    while((bytesRead2 += fs2.Read(buffer2, 0, buffer2.Length)) < (fileLength)) {
        Encoding.Unicode.GetString(buffer2);
        //convert bytes into string lines...
    }
});
Mcd
  • 21
  • 4
  • if you have such a big file, consider create a index file descript file offset and length of each lines, then you can load all text on demand with in miliseconds – NoName Feb 15 '16 at 02:12
  • @Sakura What do you mean? That I should read each line (apprx 2 million) and get the length of each line? – Mcd Feb 15 '16 at 07:44
  • I mean if your file is not change frequently, you can create `one time` index file for it, then when you need you use this index file. A `12 megabytes` file can store information about offset and length of your 2 milions line. – NoName Feb 15 '16 at 11:00
  • Hmm the file changes often because various processes are writing data to it. – Mcd Feb 15 '16 at 11:48
  • Althought your file updated often, if you only append text to the end of file you still use this way. In case you change at random position, consider @Tyress answer, it really fast when read line by line – NoName Feb 15 '16 at 14:03
  • I don't think it will be able to load the content within 1 second. – Mcd Feb 15 '16 at 14:19
  • Yes, you're right, and we usually (always) avoid to load such big data to memory. And trust me, you will very lucky if you can load 1 milion items to ListView, unless you use some `virtual` loading. Let try. – NoName Feb 15 '16 at 14:27

3 Answers3

1

Using a thread won't make it any faster (technically there is a slight expense to threads so loading may take slightly longer) though it may make your app more responsive. I don't know if File.ReadAllText() is any faster?

Where you will have a problem though is data binding. If say you after loading your 1GB file from a worker thread (regardless of technique), you will now have 1GB worth of lines to databind to your ListView/ListBox. I recommend you don't loop around adding line by line to your control via say an ObservableCollection.

Instead, consider having the worker thread append batches of items to your UI thread where it can append the items to the ListView/ListBox per item in the batch.

This will cut down on the overhead of Invoke as it floods the UI message pump.

0

Since you want to read this fast I suggest using the System.IO.File class for your WPF desktop application.

        MyText = File.ReadAllText("myFile.txt", Encoding.Unicode); // If you want to read as is
        string[] lines = File.ReadAllLines("myFile.txt", Encoding.Unicode); // If you want to place each line of text into an array

Together with DataBinding, your WPF application should be able to read the text file and display it on the UI fast.

About performance, you can refer to this answer.

So use File.ReadAllText() instead of ReadToEnd() as it makes your code shorter and more readable. It also takes care of properly disposing resources as you might forget doing with a StreamReader (as you did in your snippet). - Darin Dimitrov

Also, you must consider the specs of the machine that will run your application.

Community
  • 1
  • 1
  • `ReadAllLines` loads the entire content into an array. That is not efficient because it takes to long and loading everything into memory is not recommended, I have tried that. And I have a `using() {..}` statement that automatically disposes of the FileStream object.. – Mcd Feb 15 '16 at 02:04
  • You can take a look into this [article](http://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files) and see a test-based result for reading text files in c#. Using steams or the file class should base on your needs and the data you are trying to read. – John Ephraim Tugado Feb 15 '16 at 02:10
0

When you say "Reading the content line by line takes to long", what do you mean? How are you actually reading the content?

However, more than anything else, let's take a step back and look at the idea of loading 1 GB of data into a ListView.

Personally you should use an IEnumerable to read the file, for example:

foreach (string line in File.ReadLines(path))
{

}

But more importantly you should implement pagination in your UI and cut down what's visible and what's loaded immediately. This will cut down your resource use massively and make sure you have a usable UI. You can use IEnumerable methods such as Skip() and Take(), which are effective at using your resources effectively (i.e. not loading unused data).

You wouldn't need to use any extra threads either (aside from the background thread + UI thread), but I will suggest using MVVM and INotifyPropertyChanged to skip worrying about threading altogether.

Community
  • 1
  • 1
Tyress
  • 3,573
  • 2
  • 22
  • 45