1

I am kicking off a few tasks to match regular expressions within a long string.

My list of tasks looks like this:

var splittingTasks = new List<Task>();
foreach (var keyword in _general.Keywords)
{
    splittingTasks.Add(Task.Run(() => SplitMatches(keyword)));
}
await Task.WhenAll(splittingTasks);

and the SplitMatches method looks like this:

ConcurrentBag<string> _objectifiableMatches = new();

//...

public void SplitMatches(string keyword)
{
    string patternObject = $@"\/begin {keyword}[\s\S]+?\/end {keyword}";
    Regex regexObject = new(patternObject);
    MatchCollection Matches = regexObject.Matches(_content);
    Parallel.ForEach(Matches, m =>
    {
        var replacedQuotes = m.Value.Replace($"\"", "'");
        _objectifiableMatches.Add(replacedQuotes);
    });
}

Debugging the await Task.WhenAll(splittingTasks); results to a NullReferenceException for keyword: 'keyword' threw an exception of type 'System.NullReferenceException'. Nevertheless the result is as expected.

I read this post where it says that the lambda expression within a for loop can cause problems since the delegate is not evaluated immediately. But even after copying the variable inside the foreach loop I am still getting the same error.

foreach (var kw in _general.Keywords)
{
    string keyword = kw;
    splittingTasks.Add(Task.Run(() => SplitMatches(keyword)));
}

Do you have any advice?

EDIT:

My keywords list _general.Keywords is imported from an appsettings.json file via the options pattern. However, trying the following approach produces the same debugging error:

List<string> keywords = new()
{
    "keyword1",
    "keyword2",
    "keyword3",
    "keyword4",
    "keyword5"
};

foreach (var kw in keywords)
{
    string keyword = kw;
    splittingTasks.Add(Task.Run(() => SplitMatches(keyword)));
}

EDIT2: Correct me if I am wrong but I think the error comes from debugging only. Since the Tasks exist outside of the loop but keyword does not. So the debugger tries to access the last value of keyword which is not declared in the scope. Declaring the variable keyword outside of the foreach produces no errors, BUT in turn the result is just plain wrong. Can I just ignore this error?

EDIT3:

foreach (var task in splittingTasks)
{
    Console.WriteLine(task.IsCompletedSuccessfully.ToString());
}

Returns true for all tasks. And there are not more tasks than expected! (For mentioned list in the first EDIT it would be 5 tasks)

EDIT4: I uploaded a short video displaying the problem of Visual Studio
Seems to me like VS is evaluating the variable keyword even before it is declared in the for loop. I cannot understand why and I couldn't find a way to even throw this error.

EDIT5: you can find a minimal reproducible example here
Copy paste in visual studio and try to debug the method GetKeywordMatchesAsync()

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
MichaelS
  • 171
  • 11
  • Are you sure `_general` contains only no-nulls? – Charlieface Oct 10 '21 at 05:20
  • Yes. My keywords list comes from an appsettings.json file. But I just tried it with a hardcoded list. Please see my edit. – MichaelS Oct 10 '21 at 05:25
  • What is `_objectifiableMatches` ? – Dai Oct 10 '21 at 05:47
  • `splittingTasks.Add(Task.Run(() => SplitMatches(keyword)));` <-- Why are you doing this? (Protip: you almost *never* need to use `Task.Run`). – Dai Oct 10 '21 at 05:48
  • 1
    `_objectifiableMatches` is a `ConcurrentBag _objectifiableMatches = new();`. And regarding kicking off tasks like this: I am pretty new to async/await programming in C# and that´s the way Tim Corey did it in his youtube videos :-) if you point me to a "better" way it is greatly appreciated – MichaelS Oct 10 '21 at 05:50
  • What’s the value of _content? – Vivek Nuna Oct 10 '21 at 05:53
  • `_content = File.ReadAllText(contentPath, Encoding.GetEncoding(1252));` and `private readonly string _content;` – MichaelS Oct 10 '21 at 05:55
  • @Dai I just tried multiple ways of kicking of tasks described here: https://dotnetcodr.com/2014/01/01/5-ways-to-start-a-task-in-net-c/ ... they all produce the same error. i really think that the error is somehow produced by the debugger trying to evaluate a variable which doesnt exist anymore in the debugging scope (please see my 2nd edit) – MichaelS Oct 10 '21 at 06:02
  • It would be easier for us to understand what you're experiencing if you would create a screen-recording (use Windows' built-in game DVR (WinKey + G) in Visual Studio). – Dai Oct 10 '21 at 06:31
  • 1
    @Dai I uploaded a video here: https://drive.google.com/file/d/1nuISdp8HNN3ZQctmUaZBtTEqHcqYxs8-/view?usp=sharing please tell me if it works on your end. The funny thing is that the variable keyword is even evaluated BEFORE entering the for loop and I cannot grasp why this is.. – MichaelS Oct 10 '21 at 07:07
  • I'm having a hard time reproducing this. Please provide one [mre] that people can copy-paste instead of code spread out over multiple edits and comments. – CodeCaster Oct 10 '21 at 08:28
  • @CodeCaster i can reproduce the problem with https://dotnetfiddle.net/zJMvkz on my end in Visual Studio EDIT: this compiles without issues. it just debugs incorrectly in VS. and i cannot understand why EDIT2: to reproduce copy paste this in a new 5.0 .Net Core project in visual studio and try to debug `GetKeywordMatchesAsync()` – MichaelS Oct 10 '21 at 08:52
  • 1
    You are worrying about things you shouldn't worry about :). The lambda expressions you are using are compiled with some helper classes under the hood. Your variable keyword is a member of those classes. The debugger hides this complexity and is running into a NullReferenceException before and after those helper classes are created/disposed. A single variable couldn't ever throw a NullReferenceException, regardles of the value it has. Open your assembly with ILSpy, switch back some language versions and have a look at your method to get an idea how your code gets compiled. – Steeeve Oct 10 '21 at 09:10
  • thank you @Steeeve. this does make a lot of sense. I will look into the problem a bit more when I have time. For now it´s perfectly fine for me to just know I can accept this error/warning. Just out of curiosity: Do you happen to know how if there is a way to code this task spawning logic without a lambda expression? – MichaelS Oct 10 '21 at 09:16
  • @MichaelS You could use local functions, but it probably wouldn't change much. Your foreach loop is just fine as is. One single remark I can't check ATM: I'm not sure if you need `string keyword = kw` in this case, because string is a value type. It it was a reference type, you would need it. – Steeeve Oct 10 '21 at 09:23
  • As a side note, by launching multiple `Task.Run`, with each one of them performing a `Parallel.ForEach` inside, you are overparallelizing your workload. One `Parallel.ForEach` is enough to [saturate your `ThreadPool`](https://stackoverflow.com/questions/66261010/multiple-parallel-foreach-loops-in-net/66263583#66263583) (if you omit the `MaxDegreeOfParallelism` configuration), and you are creating a bunch of them. In other words your setup is not optimal, but I can't suggest improvements by just guessing what you are trying to do. – Theodor Zoulias Oct 10 '21 at 10:11
  • Using a `ConcurrentBag` is also questionable. This is a highly specialized class, intended for mixed producer-consumer scenarios, and your scenario is not one of them. A `ConcurrentQueue` is preferable, because it preserves the order of the inserted items. Probably it would be even better if you eliminated the need for such a container altogether, by switching to PLINQ instead of the `Parallel.ForEach`. – Theodor Zoulias Oct 10 '21 at 10:20
  • 1
    @TheodorZoulias Thank you very much for your suggestions. The overparallisiation is something I never thought about until now. Your link helped me a lot and due to the fact I don't specify any degree of parallelization I removed any additional Parallel Statements. However, I do not see how I can use PLINQ in combination with Tasks to eliminate the use of concurrent collection, since each task is writing the same variable. Also could you ellaborate why concurrent collections are "bad"? – MichaelS Oct 10 '21 at 18:52
  • Michael you can think of PLINQ as a version of `Parallel.ForEach` that also produces results. For example: `var results = someEnumerable.AsParallel().AsOrdered().Select(x => Transform(x)).ToArray();`. Whenever I have the need for doing parallel work that produces results, PLINQ is the tool that I consider first. Whether it is suitable for your case, I don't know really. The concurrent collections aren't bad. It's the `ConcurrentBag` that is misused frequently, and I blame Microsoft's examples for this. They use it in all the wrong places. I've seen it used correctly only once in the docs. – Theodor Zoulias Oct 10 '21 at 19:06

1 Answers1

0

After I sight the video I saw that you placed the breakpoints out of scope of the variable (1).

Your code is absolutely correct though I wonder why you're parallelizing the code twice. In my eyes you don't have to do that, but that's a different topic.

When you place your breakpoint after the initialization of the keyword variable, you'll get a value (2) as seen here: enter image description here

StefanM
  • 49
  • 1
  • 5