1

I have thousands of files located in FTP server. My task is to download the files from ftpserver, then unzip the file, then process the file. For downloading i am using Tamir library and for unzipping i am using Ionic.zip and then processing the files.

When i used threads, downloading files from FTP server stopped, don't know the reason, Maybe FTP server is not allowing to download file by using threads. Then i used thread only for unzipping the file and for processing. This also failed with an error like

The process cannot access the file 'file ' because it is being used by another process`.

So now i am doing everything sequentially. Prototype of code is as shown below

 static void Main(string[] args)
        {
            string FTPpah = "d://Testpath";
            DonloadUnzipProcessFile(FTPpah);
        }

        private static void DonloadUnzipProcessFile(string FTPpah)
        {
            string Localpath = @"e://testpath";
            //Using Tamir libraryr
            DownloadFile(FTPpah,Localpath);
            //Ionic.zip library
            UnzipFile(Localpath);
            //c#code
            ProcessFile(Localpath);
        }

Is there any way i can improve this task by using Threads or Process?

EDIT

downloading from FTP server can not be done by threads? If so i am thinking of unzipping and processing by using task. So i will create 10 task (TPL) each will take 10 files at a time and unzip, then ten task will process, are such scenarios possible?

James Z
  • 12,209
  • 10
  • 24
  • 44
peter
  • 8,158
  • 21
  • 66
  • 119
  • 2
    Search for TPL DataFlow – Sir Rufo Aug 19 '17 at 06:41
  • 2
    Tamir/SharpSSH library is an SFTP library, not an FTP library. That's something completely different + It's a dead project, not maintained for years. Do not use it! – Martin Prikryl Aug 19 '17 at 07:01
  • 2
    Async Await is the way to go, Such IO operations shall not be done in threads, will reduce system scalability – Mrinal Kamboj Aug 19 '17 at 08:09
  • @Mrinal Kamboj so you meant ,Downloading from FTP server will not be done by threads.IF so how to do the unzipping and processing by using thread.I am editing my question – peter Aug 19 '17 at 08:22
  • Folks i have edited my question – peter Aug 19 '17 at 08:26
  • @MrinalKamboj The OP is talking about few threads, so I do not think that your concern is relevant here. – Martin Prikryl Aug 19 '17 at 09:01
  • @peter threads are good for in memory operations, not for IO like file read, it's not that they will not process, but will be idle for most of the times. That's why async await, which doesn't use any threads for the operation. Moment you get file data in the memory, threads or task abstraction would be good – Mrinal Kamboj Aug 19 '17 at 09:03
  • @MartinPrikryl OP is talking of downloading 1000s of files, that is a huge number to even think of doing it on threads. In any case threads have no role to play in IO operation – Mrinal Kamboj Aug 19 '17 at 14:51
  • @Mrinal But no one said that you have to use the same amount of threads. That's not even possible. Most FTP servers would not allow you that many connections. Tasks in your answer are threads too. – Martin Prikryl Aug 19 '17 at 15:13
  • @MartinPrikryl It is not about any number of threads, the design which needs IO shall not use threads, since operation is Asynchronous. Whether few or many threads are used, they would sit idle waiting for call to return. Regarding FTP server supporting that many calls, that's a different aspect, as that depends on FTP server / application design and hardware, where it can buffer the calls in a queue and process them, but there shall not be any throttling from client side, by using specific number of threads, as that would impact the application scalability and Throughput – Mrinal Kamboj Aug 20 '17 at 00:03
  • @MartinPrikryl Also to add Task in `Async-Await` is not a thread [Check here](https://blog.stephencleary.com/2013/11/there-is-no-thread.html), It internally use `IO completion port` to dispatch the call while calling thread context is freed, this is not same as using Task in TPL. Also post return it will assign it a thread pool thread for the continuation processing. – Mrinal Kamboj Aug 20 '17 at 00:06
  • Any update about your question ? does any of the answers can be accepted ? – Soleil May 09 '18 at 14:18

2 Answers2

2

Following shall be your code to create Asynchronous version, which can do the file download in the background. You can do it for 1000s of file, it will never clog the system, will have a very high throughput, since everything will happen in the Background and will be very fast.

async Task Main()
{
    // List of FTP Path and Local file Path for processing
    var ftpFilesForProcessing = new Dictionary<string, string>
    {
      {"Ftp_Path_1","Local_Path_1"},
      {"Ftp_Path_2","Local_Path_2"},
      {"Ftp_Path_3","Local_Path_3"},
    };

    // FTP Files with Task for Async processing
    var ftpFilesTaskForProcessing = new Dictionary<string, Task<string>> ();

    // Add a relevant Task to each file processing
    foreach (var file in ftpFilesForProcessing)
        ftpFilesTaskForProcessing[file.Key] = FtpRead(file.Key,file.Value);

    // All the FTP downloads will be processed here Asynchronously, then it 
       will proceed with the remaining logic
    await Task.WhenAll(ftpFilesTaskForProcessing.Values);

    // Unzip All files Asynchronously

    // Process Data using Task Parallel Library     
}

// Read the Ftp file to a local file
public async Task<string> FtpRead(string ftpPath, string localPath)
{
    // Create FTP Request object  
    FtpWebRequest ftpRequest = (FtpWebRequest)WebRequest.Create(ftpPath);

    // Set FTP Request Object properties
    ftpRequest.KeepAlive = false;
    ftpRequest.UseBinary = true;
    ftpRequest.Method = WebRequestMethods.Ftp.DownloadFile;

    // This example assumes the FTP site uses anonymous logon.  
    ftpRequest.Credentials = new NetworkCredential("<Username>", "<Password>"); 

    var ftpWebResponse = await ftpRequest.GetResponseAsync();

    Stream ftpResponseStream = ((FtpWebResponse)ftpWebResponse).GetResponseStream();

    StreamReader ftpStreamReader = new StreamReader(ftpResponseStream);

    StreamWriter ftpStreamWriter = new StreamWriter(localPath);

    var fileData = await ftpStreamReader.ReadToEndAsync();

    await ftpStreamWriter.WriteAsync(fileData);

    ftpStreamReader.Close();
    ftpResponseStream.Close();

    return localPath;
}
Mrinal Kamboj
  • 11,300
  • 5
  • 40
  • 74
2

First, tasks are NOT necessarily threads. (What is the difference between task and thread?)

Second, I wouldn't recommend to use threads, but Tasks or Parallel.Foreach, since they have their own optimization, unless you have something very specific to achieve through threads.

For your scenario, I would do this: create a class ProcessFile that will download, unzip, process one file, and trigger an event; have a Enumerable/List of n (say 10) instances of ProcessFile; the class that will manage those ProcessFile would react to the event by adding a new instance so you keep n active files being processed.

Good luck

Soleil
  • 6,404
  • 5
  • 41
  • 61