2

I have a text file which I read to a string content. To identify the textbody which I want to process further I am getting the indexes of keywords in the string and then set the "starting" index to the smallest index found.

I tried this with Parallel.ForEach ...

ConcurrentBag<int> indexes = new();
int index;

switch (Case)
{
    case 1:
        Parallel.ForEach(KeywordTypes.GetImplementedNamedObjects(), inos =>
        {
            index = content.IndexOf($"/begin {inos}");
            index = index == -1 ? content.Length : index;
            indexes.Add(index);
        });
        index = indexes.Min();
        return index;

... and with foreach:

foreach (string inos in KeywordTypes.GetImplementedNamedObjects())
{
    index = content.IndexOf($"/begin {inos}");
    index = index == -1 ? content.Length : index;
    indexes.Add(index);
}

index = indexes.Min();
return index;

where foreach produces the expected result but Parallel.ForEach does not.

Why is my code not thread-safe?

MichaelS
  • 171
  • 11
  • @FranzGleichmann Marc already found the problem. I tried locking Indexes although it was a thread-safe variable but totally forgot about int index itself, which I am declaring out of the Parallel.Foreach which leads to unsafe accessing of the variable. I did this because I am using it in multiple switch statements and that´s why it is failing. – MichaelS Sep 13 '21 at 06:39
  • 1
    As a side note, in case it was important to get the `indexes` in the order of insertion, you could switch from the `ConcurrentBag` to a `ConcurrentQueue`. But in general when you want to do parallel work and you want the results back, the `PLINQ` library (the `AsParallel` LINQ operator) is more suitable than the `Parallel` class. You can see an example [here](https://stackoverflow.com/questions/62656615/how-to-enforce-a-sequence-of-ordered-execution-in-parallel-for/62662369#62662369). – Theodor Zoulias Sep 13 '21 at 08:15

1 Answers1

7

There is only one index variable here, because it is "captured". This means that multiple threads can be squabbling over it, instead of having their own version each.

Consider:

  • thread A computes index = content.IndexOf($"/begin {inos}");
  • thread B computes index = content.IndexOf($"/begin {inos}"); - oops, thread A's version just got overwritten
  • thread A computes index = index == -1 ? content.Length : index; using the index that B just updated
  • etc

the point is: a value got lost due to thread contention.

Simply move the declaration of index to fix this:

Parallel.ForEach(KeywordTypes.GetImplementedNamedObjects(), inos =>
{
    var index = content.IndexOf($"/begin {inos}");
    ...

Fundamentally, the scope of a variable is defined by where it is declared. If a variable is declared outside of a local method / lambda, the compiler respects what you've asked, and that variable is shared between all uses of that local method / lambda; if it is declared inside the local method / lambda, then the lifetime is local to that invocation, and no state is shared between the callers.

If you want to be absolutely sure you're not accidentally leaking state, the static modifier on a lambda achieves this, although it also prevents indexes being accessed, so ... probably not what you need here.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Man I feel really stupid right now. Thank you very much! Originally I coded it correctly but starting using `index` in multiple switch cases and moved the declaration outside of the Parallel.Foreach scope. But thanks for explaining it in depth again! – MichaelS Sep 13 '21 at 06:37
  • 5
    @MichaelS threading, parallelism, and state contention is just about the hardest thing in programming; there's no "stupid" here - just new things to learn – Marc Gravell Sep 13 '21 at 06:41