0

I'm converting my old - legacy - web application that process - parse - uploaded files. It uses pure javascript + MVC.

/* javascript */
function Parse(files) {
    var idFiles=document.getElementById('inputF').value;
    CallServer('ProcessFile/Parse', idFiles, function (m) 
    {
      if (m != null) 
      {
       var om = JSON.parse(m);
       MsgLog('Parse finished ' + om.msg);
      }
    });
}

var CallServer = function (url, content, callback) {
  var rqs = new XMLHttpRequest();
  rqs.open('POST', url, true);
  rqs.onreadystatechange = function () {
    if (rqs.readyState == 4 && rqs.status == 200) {
      callback(rqs.responseText);
      return true;
    } else { return false; }
  };
  rqs.send(content);
}

/* View */
...
<a href="#" id="parseFiles" onclick="Parse()" title="Parse selected documents">Parse</a>
...
/* Controller */
public JsonResult Parse(string idFiles){
 // parses input string and returns an array of int
 int[] idf=getIds(idFiles); 
 string wholeFile=string.Empty;
 string parsedData=string.Empty;
 string outputFileName=string.Empty;
 List<string> doneFiles=new List<string>();
 foreach(int f in idf)
 {
    wholeFile=ReadFile(f);
    parsedData=ParseFile(wholeFile);
    outputFileName=SaveParsed(parsedData);
    doneFiles.Add(outputFileName);
 }
 return  Json(new { doneFiles });
}

The controller returns data to view showing the list of parsed files. Since parsing takes long to run, I'm trying to convert this code into a async/await Task + Parallel (?) code. I rewrote the Parse method but I'm a bit confused on async/await Task + Parallel stuffs. Moreover I don't know if I have to change javascript. Now the controller looks like:

public JsonResult Parse(string idFiles){
 int[] idf=getIds(idFiles); 
 string wholeFile=string.Empty;
 string parsedData=string.Empty;
 string outputFileName=string.Empty;
 List<string> doneFiles=new List<string>();
  await Task.Run(() =>Parallel.ForEach(idf, async currFile =>
  {
         wholeFile=ReadFile(currFile);
         Task<string> parseData=ParseFile(wholeFile);
         await Task.WhenAll(parseData);
         Task<string> write = await parseData.ContinueWith(async (aa) => await SaveParsed(parsedData));
         Task.WhenAll(write);
 }
 return  Json(new { doneFiles });
}

My desiderata would be a true async Parse method that when finished update the list of parsed files in the view... file1 - parsed file2 - parsed file3 -

1 Answers1

0

Since parsing takes long to run, I'm trying to convert this code into a async/await Task + Parallel (?) code.

Neither of those will magically improve the performance of your code. You need to know what your code does and why is it slow and then use the right tool for the job.

For example, if the slow part is communicating with a server, making that communication concurrent using async-await might help. If the slow part is CPU-bound code, Parallel.ForEach() might be the right solution.


That being said, your code contains a lot of mistakes:

  • await Task.Run(…): this just switches your code to execute on a different thread, it does not improve performance in any way. The only thing it does is to add some overhead.

  • Parallel.ForEach(idf, async currFile => …: Parallel.ForEach() does not play nice with async-await. Use one or the other, as appropriate, but not both.

  • wholeFile=ReadFile(currFile);: You're defining your variables outside the loop, which means a single variable is shared by all the iterations of the loop. That's fine for simple foreach (though I would consider it a bad style even then), but it won't work correctly for parallel code, because the threads will keep rewriting each others variables, causing confusing bugs.

  • wholeFile=ReadFile(currFile); Task<string> parseData=ParseFile(wholeFile);: ReadFile sounds like an IO-bound operation, which means it would make sense to make it async. ParseFile sounds like a CPU-bound operation, so it probably doesn't make sense to make it async.

  • await Task.WhenAll(parseData);: Use Task.WhenAll() when you want to await multiple Tasks at the same time. You have only one Task, so you can use just await parseData.

  • await parseData.ContinueWith(async (aa) => await SaveParsed(parsedData)): Generally speaking, async-await makes ContinueWith() obsolete. If you want to get the result of a Task, using await is much simpler than ContinueWith().

  • Task.WhenAll(write);: This doesn't do anything, Task.WhenAll() returns a Task that combines several other Tasks, it doesn't do any waiting on its own.

  • You're never adding to doneFiles, so the result is always going to be empty. But you can't just call Add() from inside the parallel loop, because it's not thread-safe.

  • To use await, you need to mark your method as async and change the return type to a Task.


How could you make this all work? Like I said, it depends a lot on what the performance characteristics of your code are. So, I'm going to assume that reading and writing files concurrently makes sense, or at least it does not do harm (it can, especially with normal spinning hard drives). And I'm also going to assume that parsing in parallel makes sense (it might not, for example, if your server is always processing many requests).

For that, a simple version of your code might look like this:

private async Task<string> ParseOne(int fileId)
{
    string wholeFile = await ReadFile(fileId);
    string parsedData = ParseFile(wholeFile);
    string outputFileName = await SaveParsed(parsedData);
    return outputFileName;
}

public async Task<JsonResult> Parse(string idFiles)
{
    int[] idf = getIds(idFiles); 

    var doneFiles = await Task.WhenAll(idf.Select(id => ParseOne(id)));

    return  Json(new { doneFiles });
}

A more complicated version of this code might also be more appropriate. For example, using SemaphoreSlim to limit concurrency. Or even TPL Dataflow for better control of each part of the parsing process.

Community
  • 1
  • 1
svick
  • 236,525
  • 50
  • 385
  • 514