-1

I searched some examples on google but I could not really find something I need.

I have this while loop:

StreamReader ImportFile = new StreamReader(@"c:\users\matthew\desktop\test.txt");

string line;
while ((line = ImportFile.ReadLine()) != null)
{
   doneTotal++;

   string[] info = line.Split('-');
   string username = info.Length >= 1 ? info[0] : null;
   string file = info.Length >= 2 ? info[1] : null;

   myfunc(username, file);
}

So, myfunc is fast but not fast enough for me. How can I Parallel or Multithread this while loop?

Thanks.

wtm
  • 166
  • 1
  • 13
  • 2
    I'm pretty sure what's slow about your process is I/O. You can't multithread I/O for better performance. Just read the whole file in as one string and process each line separately. – Gus Jul 01 '16 at 16:52
  • couldnt you read in all lines then iterate over them in a parallel foreach? – Steven Wood Jul 01 '16 at 16:52
  • @Gus & Steven The files I use are sometimes too big to open, so I think I have to use this, unless there is a other way to open a big file? – wtm Jul 01 '16 at 16:56
  • 2
    You could use [TPL dataflow](https://msdn.microsoft.com/fr-fr/library/hh228603(v=vs.110).aspx) to build asynchronous pipeline. Or use `tasks.Add(Task.Factory.StartNew(() => myFunc(username, file)); /*...*/ Task.WaitAll(tasks)` – Kalten Jul 01 '16 at 16:58
  • @Matthew on second read I notice that you're reading one file to get the filename of another file (presumably one per line?) Which of those files are you expecting to be large? In other words, the performance problem may actually be hidden in `myfunc`. Check out this answer for some good ideas: http://stackoverflow.com/a/4274051/535515 – Gus Jul 01 '16 at 17:01
  • 1
    It will not help you if you implement this with multithreading, because the bottleneck is not at computing from CPU. The slow part in this code is reading from the hard disk, so a paralel implementation won't help you. – meJustAndrew Jul 01 '16 at 17:03
  • @Gus The myfunc is creating some posts to my server with information including the username and file string, thats all. So yes, the problem is inside the function but there is not really a way to make it faster, so I thought I would make it faster by just multi/parallel it so I do multiple at the same time – wtm Jul 01 '16 at 17:10
  • 1
    OK. Yep, @Kalten's got the right idea then. You should update the question with what you're doing in `myfunc` – Gus Jul 01 '16 at 17:19

2 Answers2

1

Just on a hunch, try this. First, add a new class to represent a set of parameters for myfunc. (It could even be Tuple<string, string>).

public class MyFuncParameters
{
    public string UserName { get; set; }
    public string File { get; set; }
}

Then modify your original method like this:

StreamReader ImportFile = new StreamReader(@"c:\users\matthew\desktop\test.txt");

string line;
var filesToProcess = new List<MyFuncParameters>();
while ((line = ImportFile.ReadLine()) != null)
{
    doneTotal++;

    string[] info = line.Split('-');
    string username = info.Length >= 1 ? info[0] : null;
    string file = info.Length >= 2 ? info[1] : null;

    filesToProcess.Add(new MyFuncParameters {File = file, UserName = username});
}

foreach (var fileToProcess in filesToProcess)
{
    myfunc(fileToProcess.UserName, fileToProcess.File);
}

In other words, first read everything you need from the one file, and then if you're iterating through another list of files (created from the original file) do that next. You may see some improved performance by not reading one file and then doing something (myfunc) with another file in between reads to the original file.

That's a guess. It very likely depends on what exactly myfunc does since you indicated that's the part that's slow.

As stated in the comments, you can launch as many parallel threads to read files as you want, but only one of them can actually read from the disk at a time so it doesn't really do any good. It could even make it slower.

Scott Hannen
  • 27,588
  • 3
  • 45
  • 62
0

I thought I posted this in my other answer but I don't see it.

Can you modify what you're posting to the server so that instead of posting lots of requests each with one user name and one file name, you post a collection instead? One (or several) big requests will be a lot faster than lots of little ones. Each item takes the same time to process, but you remove the overhead of making all the individual calls.

Scott Hannen
  • 27,588
  • 3
  • 45
  • 62