C# foreach loop comically slower than for loop on a RaspberryPi

Question

I was testing a .NET application on a RaspberryPi and whereas each iteration of that program took 500 milliseconds on a Windows laptop, the same took 5 seconds on a RaspberryPi. After some debugging, I found that majority of that time was being spent on a foreach loop concatenating strings.

Edit 1: To clarify, that 500 ms and 5 s time I mentioned was the time of the entire loop. I placed a timer before the loop, and stopped the timer after the loop had finished. And, the number of iterations are the same in both, 1000.

Edit 2: To time the loop, I used the answer mentioned here.

private static string ComposeRegs(List<list_of_bytes> registers)
{
    string ret = string.Empty;
    foreach (list_of_bytes register in registers)
    {
        ret += Convert.ToString(register.RegisterValue) + ",";
    }
    return ret;
}

Out of the blue I replaced the foreach with a for loop, and suddenly it starts taking almost the same time as it did on that laptop. 500 to 600 milliseconds.

private static string ComposeRegs(List<list_of_bytes> registers)
{
    string ret = string.Empty;
    for (UInt16 i = 0; i < 1000; i++)
    {
        ret += Convert.ToString(registers[i].RegisterValue) + ",";
    }
    return ret;
}

Should I always use for loops instead of foreach? Or was this just a scenario in which a for loop is way faster than a foreach loop?

`spent in a foreach loop concatenating strings.` that's your problem, not `for` vs `foreach`. Strings are immutable. Modifying or concatenating strings creates a new string. Your loops created 2000 temporary objects that need to be garbage collected. That process is *expensive*. Use a StringBuilder instead, preferably with a `capacity` roughly equal to the size of the expected string — Panagiotis Kanavos, Nov 26 '21 at 10:16
As for why there's such a difference, are you sure there is one? What did you actually measure? Are you sure the GC didn't run during the test? To get meaningful numbers use [BenchmarkDotNet](https://benchmarkdotnet.org/articles/guides/getting-started.html) to run each code enough times that you get stable results *and* account for GC and allocations — Panagiotis Kanavos, Nov 26 '21 at 10:17
Another obvious difference between your two methods is that the second aborts after 1000 items, no matter how many are in the list, and blows up if there are less than 1000. The first always processes the entire list, so depending on how many items are in the list, they may be doing vastly different amounts of work. — Damien_The_Unbeliever, Nov 26 '21 at 10:21
@PanagiotisKanavos I did not mean *IN* the for loop, I should edit that. I basically started a timer before the loops, and stopped it after the loops had finished. And each loop has a 1000 iterations. — Usman Mehmood, Nov 26 '21 at 10:22
Then the test is wrong, and fully vulnerable to GC delays. You aren't measuring what you think you are. Use BenchmarkDotNet to get meaningful results — Panagiotis Kanavos, Nov 26 '21 at 10:22
At the same time, add another test for `StringBuilder`. I suspect you'll be surprised. 500ms for only 1000 items is *excruciatingly, unbelievably slow*!!!!! An RPi has a 1+GHz core, how can it take so much time to format 1000 items? That's so little data it should fit even into the RPi's CPU cache! Never mind the Windows machine. — Panagiotis Kanavos, Nov 26 '21 at 10:26
You are right. So my problem is not the for and foreach loops, but something else entirely? — Usman Mehmood, Nov 26 '21 at 10:29
Bear in mind that we have no access to `list_of_bytes` and so *no means of reproducing your issue*, if it exists, locally. If, as seems to be suspected, it's a flaw with the benchmarking, *we* cannot fix it without being able to run the code. — Damien_The_Unbeliever, Nov 26 '21 at 10:37
@UsmanMehmood try again using BenchmarkDotNet. This is the defacto standard for benchmarking, used by the .NET Team itself and people expect to see the BDN results to any benchmarking question *unless the problem is obvious* — Panagiotis Kanavos, Nov 26 '21 at 10:44
@UsmanMehmood I added the BDN results from my machine. For/ForEach take 10 times longer and use 100 times more RAM than StringBuilder — Panagiotis Kanavos, Nov 26 '21 at 11:05
Why the OP isn't listening to @Damien_The_Unbeliever is unbelievable. The two loops have completely different behaviors. One stops at 1k items; the other goes through the entire collection, regardless of size. And regardless of additive strings vs a string builder, the comparisons are simply invalid as written. Even the suggestion of using BenchmarkDotNet is premature until the test cases are evaluating the same number of items. ‍♂️ — Metro Smurf, Nov 26 '21 at 13:41

Panagiotis Kanavos · Accepted Answer · 2021-11-26T11:22:55.813

The actual problem is concatenating strings not a difference between for vs foreach. The reported timings are excruciatingly slow even on a Raspberry Pi. 1000 items is so little data it can fit in either machine's CPU cache. An RPi has a 1+ GHZ CPU which means each concatenation takes at leas 1000 cycles.

The problem is the concatenation. Strings are immutable. Modifying or concatenating strings creates a new string. Your loops created 2000 temporary objects that need to be garbage collected. That process is expensive. Use a StringBuilder instead, preferably with a capacity roughly equal to the size of the expected string.

    [Benchmark]
    public string StringBuilder()
    {
        var sb = new StringBuilder(registers.Count * 3);
        foreach (list_of_bytes register in registers)
        {
            sb.AppendFormat("{0}",register.RegisterValue);
        }
        return sb.ToString();
    }

Simply measuring a single execution, or even averaging 10 executions, won't produce valid numbers. It's quite possible the GC run to collect those 2000 objects during one of the tests. It's also quite possible that one of the tests was delayed by JIT compilation or any other number of reasons. A test should run long enough to produce stable numbers.

The defacto standard for .NET benchmarking is BenchmarkDotNet. That library will run each benchmark long enough to eliminate startup and cooldown effect and account for memory allocations and GC collections. You'll see not only how much each test takes but how much RAM is used and how many GCs are caused

To actually measure your code try using this benchmark using BenchmarkDotNet :

[MemoryDiagnoser]
[MarkdownExporterAttribute.StackOverflow]
public class ConcatTest
{

    private readonly List<list_of_bytes> registers;


    public ConcatTest()
    {
        registers = Enumerable.Range(0,1000).Select(i=>new list_of_bytes(i)).ToList();
    }

    [Benchmark]
    public string StringBuilder()
    {
        var sb = new StringBuilder(registers.Count*3);
        foreach (var register in registers)
        {
            sb.AppendFormat("{0}",register.RegisterValue);
        }
        return sb.ToString();
    }

    [Benchmark]
    public string ForEach()
    {
        string ret = string.Empty;
        foreach (list_of_bytes register in registers)
        {
            ret += Convert.ToString(register.RegisterValue) + ",";
        }
        return ret;
    }

    [Benchmark]
    public string For()
    {
        string ret = string.Empty;
        for (UInt16 i = 0; i < registers.Count; i++)
        {
            ret += Convert.ToString(registers[i].RegisterValue) + ",";
        }
        return ret;
    }

}

The tests are run by calling BenchmarkRunner.Run<ConcatTest>()

using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Linq;

public class Program
{
    public static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<ConcatTest>();
        Console.WriteLine(summary);
    }
}

Results

Running this on a Macbook produced the following results. Note that BenchmarkDotNet produced results ready to use in StackOverflow, and the runtime information is included in the results :

BenchmarkDotNet=v0.13.1, OS=macOS Big Sur 11.5.2 (20G95) [Darwin 20.6.0]
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT


        Method |      Mean |    Error |   StdDev |    Gen 0 |   Gen 1 | Allocated |
-------------- |----------:|---------:|---------:|---------:|--------:|----------:|
 StringBuilder |  34.56 μs | 0.682 μs | 0.729 μs |   7.5684 |  0.3052 |     35 KB |
       ForEach | 278.36 μs | 5.509 μs | 5.894 μs | 818.8477 | 24.4141 |  3,763 KB |
           For | 268.72 μs | 3.611 μs | 3.015 μs | 818.8477 | 24.4141 |  3,763 KB |

Both For and ForEach took almost 10 times more than StringBuilder and used 100 times as much RAM

Ran Turner · Answer 2 · 2021-11-26T12:53:48.983

2

If a string can change like in your example then using a StringBuilder is a better option and could help the issue your dealing with.

Modify any string object will result into the creation of a new string object. This makes the use of string costly. So when the user needs the repetitive operations on the string then the need of StringBuilder come into existence. It provides the optimized way to deal with the repetitive and multiple string manipulation operations. It represents a mutable string of characters. Mutable means the string which can be changed. So String objects are immutable but StringBuilder is the mutable string type. It will not create a new modified instance of the current string object but do the modifications in the existing string object.

So intead of creating many temporary objects that will need to be garbage collected and mean while are taking a lot of memory, just use StringBuilder.

More about StringBuilder - https://learn.microsoft.com/en-us/dotnet/api/system.text.stringbuilder?view=net-6.0

edited Nov 26 '21 at 12:53

answered Nov 26 '21 at 10:18

Ran Turner

14,906
5
47
53

1

That doesn't explain the difference. Both loops are equally slow. Most likely the measuring code is measuring the wrong things – Panagiotis Kanavos Nov 26 '21 at 10:21
"but do the modifications in the existing string object" - Not true. There is no existing string object. – Enigmativity Nov 26 '21 at 10:28
1

"StringBuilder is the mutable string type" - This is not true either. – Enigmativity Nov 26 '21 at 10:29
"creating many temporary objects that will need to be garbage collected" - Is that the issue or is creating the temporary objects in the first place the issue? In other words, would it take a long time still if there weren't a GC? – Enigmativity Nov 26 '21 at 10:30

C# foreach loop comically slower than for loop on a RaspberryPi

2 Answers2

Linked