1

We are using the .Net FtpWebRequest class to transfer files in our application and it seems that we are experiencing problems with an indefinite wait, we think this is potentially occurring somewhere inside the .Net library code.

We are using the async versions of the methods and our (simplified) code looks as follows:

async Task DoTransfer(int id, CancellationToken cTkn)
{
    try
    {
         FtpWebRequest request = (FtpWebRequest)WebRequest.Create(targetAddress);
         request.UsePassive = true;
         request.KeepAlive = true;
         request.Timeout = 300000; // 5 minutes
         request.Method = WebRequestMethods.Ftp.UploadFile;
         using (var stream = await request.GetRequestStreamAsync())
         {
             ...create a byte buffer of the file here
             cTkn.ThrowIfCancellationRequested();
             await stream.WriteAsync(buffer, 0, buffer.Length, cTkn);
         }
         using (var response = (FtpWebResponse)await request.GetResponseAsync())
         {
             cTkn.ThrowIfCancellationRequested();
             ...do something with status code
         }
    }
    catch (OperationCanceledException)
    {
        ...logging        
    }                           
    catch (Exception ex)
    {
        ...logging
    }
    finally
    {
         ...code to remove this task from a concurrent dictionary using the 'id' param   
    }
}

When we create the task initially, we add it to a concurrent dictionary for monitoring (random ID generated). The task then removes itself from this dictionary once it is complete (in the finally block). The problem we are having is that the task never removes itself indicating that the finally block has never been reached.

The cancellation token we use is a linked token from a "master" cancellation token and a new timeout token created just before the task is launched (set to 5 minutes).

We can't isolate which method is hanging as this app processes around 100 files/minute and the issue only occurs very rarely so the log files are just too large to read manually.

The app may start up to 24 of these DoTransfer tasks at any one time (often connecting to the same FTP server).

Does anyone know of any issues with either the GetRequestStreamAsync() or GetResponseAsync() methods that may cause them to never return when run in parallel like this?

Or have any suggestions on how to terminate long running tasks (as we can't pass the cancellation token to either of those two FtpWebRequest methods)?

AlexB
  • 91
  • 1
  • 8
  • Storing tasks like that, modifying global state from *inside* the task is a problem in itself, especially when that state involves the task itself. Use an `ActionBlock` with a DOP set to the number of concurrent downloads you want and post as many addresses to it as needed. The ActionBlock itself takes care of tasks, requests, completion – Panagiotis Kanavos Aug 01 '18 at 09:52
  • In essence, you are trying to build an asynchronous download worker. Instead of rolling your own it's easier to use the built-in classes for this. – Panagiotis Kanavos Aug 01 '18 at 09:54
  • @PanagiotisKanavos thanks for the suggestion, I'll have a look at refactoring that bit – AlexB Aug 01 '18 at 09:54
  • 2
    I'm using Dataflow blocks to download hundreds of files with a limited number of connections. An ActionBlock provides the worker. Timeouts and cancellation are a *different* issue. First, you don't have to *throw* if a CancellationToken is raised. Check `IsCancellationRequested` instead and return. Second, you already set a request timeout so you don't need a CT for it. Finally, you may not need the master CT - you can call `Cancel()` on the block and have it drop all queued URLs while allowing it to finish the requests in progress. – Panagiotis Kanavos Aug 01 '18 at 10:00
  • @PanagiotisKanavos I've had a look at the Dataflow docs and it does seem more suited for what I'm trying to do here, thanks for the tips on tokens/timeouts as well. – AlexB Aug 01 '18 at 10:04
  • 1
    There's a famous phrase, "select isn't broken". The chances that the problem exist in library code that has been used successfully by millions of other users versus it's a problem in your code that you haven't correctly isolated yet is minuscule. – Damien_The_Unbeliever Aug 01 '18 at 10:11

0 Answers0