3

I have a Windows service that polls a remote FTP server every three seconds. It checks a directory for files, downloads any files present, and deletes those files once downloaded. Average file size is 10 KB, and rarely they will go up to the 100 KB range.

Occasionally (I have noticed no pattern), the WebClient will throw the following:

System.Net.WebException: The operation has timed out.
at System.Net.WebClient.OpenRead(Uri address)

It will do this for one or more files, usually whatever files are in the remote directory at that time. It will continue to do so indefinitely, churning on the "stuck" files at each polling interval. The bizarre part is that when I stop/start the Windows service, the "stuck" files download perfectly and the polling/downloading works again for long stretches of time. This is bizarre because I download like this:

private object _pollingLock = new object();

public void PollingTimerElapsed(object sender, ElapsedEventArgs e)
{
    if(Monitor.TryEnter(_pollingLock);
    {
        //FtpHelper lists content of files in directory
        ...

        foreach(var file in files)
        {
            using(var client = new WebClient())
            {
                client.Proxy = null;
                using(var data = client.OpenRead(file.Uri)
                {
                    //Use data stream to write file locally
                    ...
                }
            }
            //FtpHelper deletes the file
            ...
        } 
    }
    //Release the _pollingLock inside a finally
}

I would assume that a new connection is opened and closed for each file (unless .NET is doing something behind the scenes). If a file download had an issue, it would get a fresh retry on the next polling interval (in 3 sec). Why would a service restart make things work?

I've begun to suspect that the issue has something to do with caching (file or connection). Recently I tried going into Internet Explorer and clearing the cache. Approximately 30 sec or so later, all the files downloaded with no service restart. But, the next batch of files to arrive all got hung up again. I might try adding a line like this:

client.CachePolicy = new RequestCachePolicy(RequestCacheLevel.NoCacheNoStore);

or try disabling KeepAlives, but I want to get some opinions before I start trying random stuff.

So: What is causing the occasional timeouts? Why does restarting the service work? Why did clearing the cache work?

Update

I made the cache policy and keep alive change mentioned above about two weeks ago. I just now got my first timeout since then. It appears to have improved frequency, but alas, it is still happening.

Update

As requested, this is how I am kicking off the Timer:

_pollingTimer.AutoReset = true; 
_pollingTimer.Elapser += PollingTimerElapsed; 
_pollingTimer.Interval = 10000; 
_pollingTimer.Enabled = true;`
koopaking3
  • 3,375
  • 2
  • 25
  • 36
  • You poll the server _every three seconds_? Wow. Have you considered mounting that resource instead? – arkascha Apr 18 '14 at 19:09
  • Sadly, I don't have that option as I do not control the server, the data source, or the choice of download mechanism. Luckily, I am the only one polling this server, though. I admit 3 seconds is extreme, but I need a low latency on this data. – koopaking3 Apr 18 '14 at 19:13
  • 2
    Maybe you exhausted the number of ports available: http://stackoverflow.com/a/1087525/578411 – rene Apr 19 '14 at 11:25
  • Have you enabled network traces? http://msdn.microsoft.com/en-us/library/ty48b824(v=vs.110).aspx also have you checked the DefaultConnectionLimit: http://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.defaultconnectionlimit.aspx – Simon Mourier May 11 '14 at 06:56
  • @Simon Mourier, I've already been tweaking `ServicePoint` values the last few days and am waiting to see how it plays out. Thanks. – koopaking3 May 12 '14 at 14:11
  • @rene I don't see an excessive amount of connections on the IP when I do a `netstat -a`. Could it still be ports? – koopaking3 May 12 '14 at 14:13
  • @koopaking3 it can be on both ends...can you try http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx to see how many ports you have in close wait – rene May 12 '14 at 14:37
  • There is few missing pieces of information here which are all related to sync. 1 - Do you write file locally in sync/async way. 2 - what is your policy about concurrent ftp access ( when the download and write process exceed 3s ). You MAY enqueue the job and keep track on the already in file downloading using the name of the file. In order to lowered the occurence of the "bug" you can first delete the file "on-fly", immediatly after copying it on the disk, this will not fix but reduce possible colision case. –  May 14 '14 at 15:33
  • How are you kicking off the timer? can you post the code that creates & configures the timer? – steve cook May 15 '14 at 02:36
  • @steve The `Timer` is wrapped inside a helper class, so I won't post that whole file, but I will post a snippet above. – koopaking3 May 15 '14 at 21:21
  • @GuillaumePelletier 1) I write the file locally synchronously. 2) The polling timer locks on a sync object so that the time does not elapse again and hit the same FTP folder. No concurrent calls. 3) Are you saying to delete the file as soon as it is locally on disk? If so, then I am currently doing that. Otherwise, I'm not sure what you mean. – koopaking3 May 15 '14 at 21:29
  • Do you have access to the FTP server? What FTP software is it running? Also how do you handle the timeout exception when it is thrown? – steve cook May 16 '14 at 03:11
  • I was meaning the same as Steve answer, who express the idea in far better way than I done. At this point without more code I do not have more comments, except, why using WebRequest instead of FtpWebRequest. –  May 16 '14 at 15:25
  • @steve I have some minimal, non-admin access to the FTP server. It is running the FTP server through built-in IIS. When the exception is thrown, I catch it, log it, and exit `PollingTimerElapsed`, allowing the `Timer` to poll again on the next interval. – koopaking3 May 17 '14 at 21:15
  • koopaking3, did you a find solution to your problem? I'm facing the very same situation. I'm accessing the FTP every minute, and every week I have to restart the service once or twice. – Marcelo Dias Sep 22 '14 at 14:13
  • What I did with the cache policy above seemed to have worked for the most pat, but not long after I asked this question, the server (thankfully) switched to SFTP and I am not using Renci.SshNet, so I'm not positive. – koopaking3 Oct 24 '14 at 20:36

1 Answers1

2

Looks like you are kicking off your processing using the System.Timers.Timer.Elapsed event.

One gotcha that I found is that if your Elapsed event takes longer to execute than the timer interval, your event can be called again from another thread before it has finished executing.

This is specifically mentioned in the docs:

If the SynchronizingObject property is null, the Elapsed event is raised on a ThreadPool thread. If the processing of the Elapsed event lasts longer than Interval, the event might be raised again on another ThreadPool thread. In this situation, the event handler should be reentrant.

Assuming you are indeed using a vanilla timer with AutoReset=true (its on by default), first thing to do would be address this potential issue. You can use a SynchronizingObject, alternatively you can do something like this:

//setup code
Timer myTimer = new Timer(30000);
myTimer.AutoReset = false;
....

//Elapsed handler
public void PollingTimerElapsed(object sender, ElapsedEventArgs e)
{
    //do what you currently do
    ...

    //when finished, kick off the timer again
    myTimer.Start();
}

Either way, the main thing is to ensure that your code doesn't accidentally get called simultaneously by multiple threads - if that happens there's a good chance that occasionally you'll have one thread trying to download something from the site while another thread is simultaneously deleting the file.

The things that you mentioned e.g. it only happens occasionally, that normally file sizes are small, that its fixed by a restart, etc. would point me in the direction of this being the issue.

steve cook
  • 3,116
  • 3
  • 30
  • 51
  • I am already using `Monitor.TryEnter(object)` at the beginning of the method. I will update my snippet to reflect that. It was a good idea though. – koopaking3 May 15 '14 at 21:13
  • I think you'll need to post more code. From what you've written, the general approach should work. Most likely there are more subtle issues with the file handling, or the FTP service itself. – steve cook May 16 '14 at 03:10