-1

I am doing a simple console application that loads files from a database into a hashset. These files are then processed in a parallel foreach loop. This console application does launch a new Process object for each file it needs to process. So it opens new console windows with the application running. I am doing it this way because of logging issues I have if I run parsing from within the application where logs from different threads write into each other.

The issue is, when I do close the application, the parallel foreach loop still tries to process more files before exiting. I want all tasks in the code to stop immediately when I kill the application. Here is code excerpts:

My cancel is borrowed from: Capture console exit C#

Essentially the program performs some cleanup duties when it receives a cancel command such as CTRL+C or closing window with X button

The code I am trying to cancel is here:

class Program
{
   
    private static bool _isFileLoadingDone;
    static ConcurrentDictionary<int, Tuple<Tdx2KlarfParserProcInfo, string>> _currentProcessesConcurrentDict = new ConcurrentDictionary<int, Tuple<Tdx2KlarfParserProcInfo, string>>();

    static void Main(string[] args)
    {
        try
        {
            if (args.Length == 0)
            {
                // Some boilerplate to react to close window event, CTRL-C, kill, etc
                LaunchFolderMode();       

            }

        }
    }

   
}

Which calls:

private static void LaunchFolderMode()
{
    //Some function launched from Task
    ParseFilesUntilEmpty();
}

And this calls:

private static void ParseFilesUntilEmpty()
{
    while (!_isFileLoadingDone)
    {
        ParseFiles();
    }
    
    ParseFiles();

}

Which calls:

private static void ParseFiles()
{
    filesToProcess = new HashSet<string>(){@"file1", "file2", "file3", "file4"} //I actuall get files from a db. this just for example
    //_fileStack = new ConcurrentStack<string>(filesToProcess);
    int parallelCount = 2
    Parallel.ForEach(filesToProcess, new ParallelOptions { MaxDegreeOfParallelism = parallelCount },
        tdxFile =>{
            ConfigureAndStartProcess(tdxFile);
        });
    
}

Which finally calls:

public static void ConfigureAndStartProcess(object fileName)
{
    string fileFullPath = fileName.ToString();
    Process proc = new Process();
    string fileFullPathArg1 = fileFullPath;
    string appName = @".\TDXXMLParser.exe";
    if (fileFullPathArg1.Contains(".gz"))
    {
        StartExe(appName, proc, fileFullPathArg1);  //I set up the arguments and launch the exes. And add the processes to _currentProcessesConcurrentDict
        proc.WaitForExit();
        _currentProcessesConcurrentDict.TryRemove(proc.Id, out Tuple<Tdx2KlarfParserProcInfo, string> procFileTypePair);
        proc.Dispose();
    }

}

The concurrent dictionary to monitor processes uses the following class in the tuple:

public class Tdx2KlarfParserProcInfo
{
    public int ProcId { get; set; }
    public List<long> MemoryAtIntervalList { get; set; } = new List<long>();
}

For the sake of how long these code excerpts are, I omitted the 'StartExe()' function. All it does is set up arguments and starts the Process object process.

Why is the parallel.Foreach insisting on running even after I close the program? Is there a better parallel processing method I can use which will allow me to kill whatever files I am currently processing immedietly without trying to start a new process. Which the parallel.Foreach does?

I have tried killing it with Parallel State Stop method but it still tries to process more files before finally exiting.

edo101
  • 629
  • 6
  • 17
  • Since the `LaunchFolderMode` just calls the `ParseFilesUntilEmpty`, you could remove one of them from the question, to keep the code minimal. – Theodor Zoulias Oct 24 '22 at 23:29
  • I removed the existSystem. I would take out LaunchFolderMode but it calls one other function under it. however I wanted to leave the stack trace so you see exactly how many function calls it takes to get to the parallel.Foreach loop – edo101 Oct 25 '22 at 00:07
  • Instead of blocking threads in `WaitForExit`, use https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process.waitforexitasync?view=net-7.0 (or write it yourself https://stackoverflow.com/questions/470256/process-waitforexit-asynchronously) – Jeremy Lakeman Oct 25 '22 at 01:01
  • @JeremyLakeman what is the issue with WaitForExit()? Is it the culprit for my paralle.foreach trying to continue running even after I have sent the cancel command? Also were you the one that downvoted my question? If what did i do wrong and how can I improve it? – edo101 Oct 25 '22 at 15:32
  • @TheodorZoulias any new thoughts given Jeremy's feedback and the edits I made to the question? Thanks in advance! – edo101 Oct 25 '22 at 15:40
  • @edo101 `Parallel.ForEach` is meant for data parallelism, not starting processes. You could use a simple loop to start those processes and they'd still run independently of your application. There's no parallel state or need for ConcurrentDictionary because there's no parallelism in your application to begin with. – Panagiotis Kanavos Oct 27 '22 at 07:44
  • @edo101 Just start all processes in a loop and store the `Process` objects in an array. You could even use `LINQ` to do this, eg `var processes=files.Where(f=>Path.GetExtension(f)==".gz").Select(f->Process.Start(execPath,f).ToList();` – Panagiotis Kanavos Oct 27 '22 at 07:47
  • @edo101 it looks like your real question is how to cancel the child processes when the parent process ends. This has been answered multiple times and has nothing to do with threads or Parallel.ForEach. It's a matter of how the OS (Windows or Linux) treats and terminates child processes. – Panagiotis Kanavos Oct 27 '22 at 07:52
  • @edo101 normal termination doesn't terminate child processes. You'll have to do this in your code. Another process (or Task Manager) will have to request to kill the entire process tree to terminate a parent and its child processes, eg with [Process.Kill(true)](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process.kill?view=net-7.0#system-diagnostics-process-kill(system-boolean)). A dirty way to kill everything immediatelly would be `Process.GetCurrentProcess().Kill(true);` – Panagiotis Kanavos Oct 27 '22 at 08:02

1 Answers1

0

Unless I'm mistaking, your code seems to do no work on its own, it just launches executables and waits for them to end. And yet you're starving your thread pool on code that's just sitting there waiting for the external processes to end. Now, again if I understand correctly, this part works. It's very wasteful and utterly non-scalable, but it works.

The only thing you seem to be missing is closing the processes early when your own process ends. This is rather trivial: CancellationToken. You simply create a CancellationTokenSource in your main function and pass it down to every worker object, and when your program is meant to end you set it. That only leaves you to respond to it, and that's as easy as replacing your proc.WaitForExit(); with something like

// this is how we coded in .Net 1.0, released in Feb. 2002. 
while(!proc.HasExited && !ct.IsCancellationRequested)
    Thread.Sleep(1000);
if(ct.IsCancellationRequested)
    proc.Kill();

Now, if you also want to fix your first problem, start writing async code. Process.WaitForExitAsync(CancellationToken) returns an awaitable task that you can await with a cancellation token, so the work is done for you. Stop using Parallel.ForEach, this isn't the 90s, you have Task.WhenAll to do the collection. And at the end of all this, you'll see that your code will boil down to perhaps 10 good lines of code, instead of the mess you made for yourself.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • Also, `proc.Dispose();` -- again, improve your code. Use `using`. – Blindy Oct 25 '22 at 00:47
  • I'm confused by your answer. Again my issue is that when I do close the main console app, my parallel.foreach still tries to launch new processes and process. I have tried using the paralle.foreach state stop but it still tries to process. I need a way to stop everything in my code from running anymore. – edo101 Oct 25 '22 at 15:38
  • What is confusing you about my answer? `CancellationToken` is exactly designed to stop parallel operations when something happens, like you closing your application. Which is exactly what I mentioned above. – Blindy Oct 25 '22 at 15:49
  • @Blindy my confusion is you said I need to create a CancellationTokenSource in my main function. and then when my program ends set it. Is this going to solve my main issue which is that the paralle.Foreach still tries to process addtional files when I want to exit the code...? Or is it the second part of your answer? where you told me to write async code. Which I am not familiar with at all. – Datboydozy Oct 26 '22 at 20:18
  • `Parallel.ForEach` also takes a [cancellation token](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.paralleloptions.cancellationtoken?view=net-6.0#system-threading-tasks-paralleloptions-cancellationtoken), so yes. Do you see my confusion? All of this is trivially available on the main documentation site, so I'm unsure of what the problem is implementing this when I gave you all the keywords involved. – Blindy Oct 26 '22 at 22:19
  • @edo101 you don't need `Parallel.ForEach` to launch multiple child processes. In fact, it doesn't affect those processes at all. Cancelling `Parallel.ForEach` won't affect the child processs. It has nothing at all to do with them – Panagiotis Kanavos Oct 27 '22 at 07:51