3

Im having an interesting problem with reading a large file (~400 mb) that's on a network drive. Orginally, I fed the full network address into a FileInfo and used the CopyTo function to transfer it to a local temp drive and then read it. This seems to work okay, its not slow but its not fast - just meh. The CopyTo function would get the computer running the program's network utilization consistantly up above 50%, which is pretty good.

In order to speed up the process I tried to read the network file directly into a Memory Stream to cut out the middle man so to speak. When I tried this (using the asynchronous copy pattern described here), it is hilariously slow. My network utilization never even tops 2% - its almost like something is throttling me. FYI, I watched my network utilization when directly copying the same file via windows explorer and it hit like 80-90%... not sure what's happening here. Below is the asynchronous copy code I used:

string line;
List<string> results = new List<string>();

Parser parser = new Parser(QuerySettings.SelectedFilters, QuerySettings.SearchTerms,
QuerySettings.ExcludedTerms, QuerySettings.HighlightedTerms);

byte[] ActiveBuffer = new byte[60 * 1024];
byte[] BackBuffer = new byte[60 * 1024];
byte[] WriteBuffer = new byte[60 * 1024];

MemoryStream memStream = new MemoryStream();
FileStream fileStream = new FileStream(fullPath, FileMode.Open, FileSystemRights.Read, FileShare.None, 60 * 1024, FileOptions.SequentialScan);

int Readed = 0;
IAsyncResult ReadResult;
IAsyncResult WriteResult;

ReadResult = fileStream.BeginRead(ActiveBuffer, 0, ActiveBuffer.Length, null, null);
do
{
    Readed = fileStream.EndRead(ReadResult);

    WriteResult = memStream.BeginWrite(ActiveBuffer, 0, Readed, null, null);
    WriteBuffer = ActiveBuffer;

    if (Readed > 0)
    {
        ReadResult = fileStream.BeginRead(BackBuffer, 0, BackBuffer.Length, null, null);
        BackBuffer = Interlocked.Exchange(ref ActiveBuffer, BackBuffer);
    }

    memStream.EndWrite(WriteResult);
}
while (Readed > 0);

StreamReader streamReader = new StreamReader(memStream);
while ((line = streamReader.ReadLine()) != null)
{
    if (parser.ParseResults(line))
    results.Add(line);
}

fileStream.Flush();
fileStream.Close();

memStream.Flush();
memStream.Close();

return results;

UPDATE As per the comments I just tried the following. It only had my network utilization at about 10-15%... why so low?

MemoryStream memStream = new MemoryStream();
FileStream fileStream = File.OpenRead(fullPath);

fileStream.CopyTo(memStream);

memStream.Seek(0, 0);
StreamReader streamReader = new StreamReader(memStream);

Parser parser = new Parser(QuerySettings.SelectedFilters, QuerySettings.SearchTerms,
QuerySettings.ExcludedTerms, QuerySettings.HighlightedTerms);

while ((line = streamReader.ReadLine()) != null)
{
if (parser.ParseResults(line))
results.Add(line);
}
Community
  • 1
  • 1
Hershizer33
  • 1,206
  • 2
  • 23
  • 46
  • 2
    Given that writing to a `MemoryStream` will be very fast indeed, what happens if you just try reading *synchronously*? – Jon Skeet May 31 '12 at 15:58
  • 3
    Not seeing the point of the async stuff, considering *you're waiting in the same thread for it to finish* when you call `EndRead` or `EndWrite` outside of the callback. That's kinda the very definition of synchronous. – cHao May 31 '12 at 16:03
  • @JonSkeet Okay I tried something different as you suggested (i think). When you get a chance please read the update to the original post – Hershizer33 May 31 '12 at 18:20
  • 1
    Not seeing the point of reading it all into memory and *then* parsing it. This just wastes time and space. Feed the parser directly from the network. – user207421 Jun 01 '12 at 05:56

3 Answers3

5

I'm late to the party, but having had the same problem of low network utilization recently, after trying a lot of different implementations if found at last that a StreamReader with a large buffer (1MB in my case) increased the network utilization to 99%. None of the other options did make a significant change.

ths
  • 2,858
  • 1
  • 16
  • 21
  • It was a long time ago but I think I did 3 things to address this and one of them was increasing the buffer size. Thanks! – Hershizer33 Aug 20 '14 at 21:07
1

There is no point copying the whole file over and then parsing it. Simply open the file from the network drive and let the .Net Framework do it's best to deliver the data for you. You can be more clever than MS developers and you may create a copy method faster than they do, but it's really a challenge.

Dercsár
  • 1,636
  • 2
  • 14
  • 26
  • The problem is when I open it on the network drive and read it directly, I only get about 40% network utilization. Copy to local drive gets about 95% network utilization and thus is much faster... I can't for the life of me figure out why direct read only gets 30-40%, if it got 90ish% like copy did I'd be set – Hershizer33 Jun 01 '12 at 13:52
  • This seems to be the best solution, though I wish I knew why there was such a huge difference in network utilization. – Hershizer33 Jun 05 '12 at 15:57
1

Using Reflector, I see that your call to:

FileStream fileStream = File.OpenRead(fullPath);

ends up using a buffer of size 4096 bytes ( 0x1000 ).

public FileStream(string path, FileMode mode, FileAccess access, FileShare share) : this(path, mode, access, share, 0x1000, FileOptions.None, Path.GetFileName(path), false)
{
}

You could try calling one of the FileStream constructors explicitly, and specify a much larger buffer size and FileOption.SequentialScan.

Not sure this will help, but it is easy to try.

ahazzah
  • 756
  • 1
  • 7
  • 18