0

I have a app that reads data from a text file using:

                CRD.reader = new StreamReader(fn,Encoding.UTF8,true,1024);
                CRD.readLine();

BUT I run 16 instances of this app in parallel on my 24 core machine. When I do this the total time taken is much greater than time it takes a single instance running on it's own (even though they are running in parallel). I assume this is because of contention for the disk?

I saw a suggestion for using a bufferedstream, but I don't understand how that differs from the code above. Surely by specifying the buffer size as I have - I am using a "buffered" stream already?

For my code, I have tried various different size for the buffer - but it does not appear to make much difference.

EDIT 1

If anyone could explain how a bufferedstream differs from what I am doing - that would be very helpful

EDIT 2

If I set a large buffer with

CRD.reader = new StreamReader(fn,Encoding.UTF8,true,65536);
                CRD.readLine();

Can I force the whole buffer to be filled on first readLine? i.e. if my buffer > than filesize the whole file could/should be read into memory. It seems to me that the operating system works by allowing that much buffer, BUT not necessarily using it.

ManInMoon
  • 6,795
  • 15
  • 70
  • 133
  • why do you need all processes read from the file contemporary ? – Tigran May 14 '13 at 08:53
  • 1
    First be sure that it really *is* due to disk contention. Do a test with dummy data loaded into memory instead and see if it still gets a lot slower. Because I don't think it's going to be disk contention, especially if all the processes do is read from the file and not write to it. – Matthew Watson May 14 '13 at 08:54
  • @Matthew I can see that the disk in resource mon is working flat out - but that the CPU in perf mon is a fraction of what is available – ManInMoon May 14 '13 at 08:57
  • @Tigran The whole point for me is to do it in parallel to speed up the processing – ManInMoon May 14 '13 at 08:57
  • @ManInMoon: how big your text files are ? – Tigran May 14 '13 at 09:01
  • "I run 16 instances of this app in parallel" wouldn't it be easier to read it once and then have 16 threads working with the read string(s)? 2MB (I think that's what you meant) is not that much. – Corak May 14 '13 at 09:10
  • @Corak Yes definitely - I have a version that does it that way too. But for this exercise I need separate instances. – ManInMoon May 14 '13 at 09:25
  • Let each application read the whole file once (with File.ReadAllLines or File.ReadAllText as @Tigran mentioned) and then work with it. Even 16 * 2MB is not that much. And reading a file like that should be a matter of (very few) milliseconds. – Corak May 14 '13 at 09:38
  • @ManInMoon There's [an answer, complete with link to Microsoft blog](http://stackoverflow.com/a/2069317/351301), with an argument against using BufferedStream. Pretty much the reasoning you've used. – anton.burger May 14 '13 at 09:44
  • @shambulator Yes I spotted that one too. Good article - thanks for pointing out – ManInMoon May 14 '13 at 10:45
  • @Corak Do you think there is any advantage to changing to ReadAllLines? Or just make the buffer big enough to read the entire file in one go - but still use ReadLine? I would have thought the result would be similar - what do you think? I can't test it easily as I would need to rewrite quite a bit of code for that. – ManInMoon May 14 '13 at 10:50
  • `ReadAllLines` if you want to have each line of the file neatly seperated in one big array. `ReadAllText` to get everything together in one big string. Both should work roughly the same. – Corak May 14 '13 at 11:04

2 Answers2

1

If file sizes, according to the comment, are about 2MB, the fast processing solution would be

  • first read completely into the memory, in one shot, using for example File.ReadAllText method

  • after process content, already present in memory, so it will be much faster, the reading line by line from the disk.

Tigran
  • 61,654
  • 8
  • 86
  • 123
0
  1. Try to open file in read only mode
  2. Try to use memory mapped file it could provide best performance for concurent file access
Viacheslav Smityukh
  • 5,652
  • 4
  • 24
  • 42