0

I have read that for loop is faster than foreach and LINQ. So I have created a small Console application to check in which I am passing a list of string of numbers with blank values.

I have use for, foreach, Parallel.ForEach, Parallel.For for the check in which it iterates it and find the Index in which the value is blank and append it to the string. I have set timer before every loop and I found that foreach is much faster than any other. Please Clarify the concept. Here is the code. I have also change List to array and try it but than also foreach is faster.

static void Main(string[] args)
{
    List<string> value = new List<string>() { "1", "2", "3", "4", "5", "6",
        "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
        "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29",
        "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40",
        "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51",
        "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62",
        "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73",
        "74", "75", "76", "77", "78", "79", "80", "81", "82", "83", "84",
        "85", "86", "87", "88", "89", "90", "91", "92", " ", "", "", "",
        "", "", "", "  " };

    string ColName = "EMPNO";

    var timer = new Stopwatch();
    timer.Start();
    string a = BlankDataInColumn(value, ColName);
    timer.Stop();
    TimeSpan timeTaken = timer.Elapsed;
    string foo = "Time taken: " + timeTaken.ToString(@"m\:ss\.fff");
    Console.WriteLine(foo);

    var timer1 = new Stopwatch();
    timer1.Start();
    string b = BlankDataInColumnforeach(value, ColName);
    timer1.Stop();
    TimeSpan timeTaken1 = timer1.Elapsed;
    string foo1 = "Time taken: " + timeTaken1.ToString(@"m\:ss\.fff");
    Console.WriteLine(foo1);

    var timer12 = new Stopwatch();
    timer12.Start();
    string c = BlankDataInColumnforeachParallel(value, ColName);
    timer12.Stop();
    TimeSpan timeTaken12 = timer12.Elapsed;
    string foo12 = "Time taken: " + timeTaken12.ToString(@"m\:ss\.fff");
    Console.WriteLine(foo12);

    var timer123 = new Stopwatch();
    timer123.Start();
    string d = BlankDataInColumnforParallel(value, ColName);
    timer123.Stop();
    TimeSpan timeTaken123 = timer123.Elapsed;
    string foo123 = "Time taken: " + timeTaken123.ToString(@"m\:ss\.fff");
    Console.WriteLine(foo123);
    Console.ReadLine();
}

public static string BlankDataInColumn(List<string> Column, string ColumnName)
    {

        bool isBlank = false;
        StringBuilder rowNumber = new StringBuilder();
        for (int i = 0; i < Column.Count(); i++)
        {
            if (Column[i].HasNothing()) { rowNumber.Append($"{i + 1},"); isBlank = true; }
        }
        string BlankDataExist = isBlank ? $"The {ColumnName} have Blank Values in the following row number {rowNumber}" : null;
        return BlankDataExist;
    }

public static string BlankDataInColumnforeach(List<string> Column,
    string ColumnName)
{
    bool isBlank = false;
    StringBuilder rowNumber = new StringBuilder();
    int i = 0;
    foreach (string col in Column)
    {
        i++;
        if (col.HasNothing()) { rowNumber.Append($"{i},"); isBlank = true; }
    }
    string BlankDataExist = isBlank ?
        $"The {ColumnName} have Blank Values in the following row number {rowNumber}"
        : null;
    return BlankDataExist;
}

public static string BlankDataInColumnforeachParallel(List<string> Column,
    string ColumnName)
{
    bool isBlank = false;
    StringBuilder rowNumber = new StringBuilder();
    int i = 0;
    Parallel.ForEach(Column, col =>
    {
        i++;
        if (col.HasNothing()) { rowNumber.Append($"{i},"); isBlank = true; }
    });
    string BlankDataExist = isBlank ?
        $"The {ColumnName} have Blank Values in the following row number {rowNumber}"
        : null;
    return BlankDataExist;
}

public static string BlankDataInColumnforParallel(List<string> Column,
    string ColumnName)
{
    bool isBlank = false;
    StringBuilder rowNumber = new StringBuilder();
    Parallel.For(0, Column.Count(), i =>
    {
        if (Column[i].HasNothing()) { rowNumber.Append($"{i + 1},"); isBlank = true; }
    });
    string BlankDataExist = isBlank ?
        $"The {ColumnName} have Blank Values in the following row number {rowNumber}"
        : null;
    return BlankDataExist;
}
  • 5
    Your Parallel.ForEach looks **unsafe** because the loop body modifies the same data, `rowNumber.Append`, in particular, I don't think is safe to be modified from multiple threads. – Wyck Jan 24 '23 at 14:57
  • 2
    1) Stopwatch is not the right tool to measure performance, 2) with such few items you will hardly see any significant difference 3) StringBuilder is not thread-safe so using it within Parallel.Foreach is not correct 4) Just using Parallel.Foreach will usually not make the code run faster, might even be slower – Klaus Gütter Jan 24 '23 at 14:57
  • 2
    Your calls to `.HasNothing()` and StringBuilder will most likely dwarf any overhead from the loop itself. The correct way to test this is with `Benchmark.Net` and you'll most likely find that the loop itself is too busy to gain serious performance improvements from changing the loop mechanic. – David L Jan 24 '23 at 15:00
  • 2
    5) If getting the data involves any I/O or database access, the differences in the loop will be totally irrelevant against I/O costs – Klaus Gütter Jan 24 '23 at 15:01
  • What are you trying to do? If you want to generate a large string, all those methods are wrong one way or another. The first generates a new temporary string for every line, even though it uses `StringBuilder`. All the others are unsafe and will append items in random order, assuming they don't cause exceptions – Panagiotis Kanavos Jan 24 '23 at 16:05
  • If I wanted to find all empty strings in a list of strings I'd store the indexes in a `List`, `Queue`. I'd use `String.IsEmpty` or `String.IsNullOrWhitespace` instead of implementing my own method. These two things will result in orders of magnitude faster execution. If there were a *lot* of strings, eg 100K or more, I'd use PLINQ to find the empty strings, eg `column.AsParallel().Select((w,i)=>new {Word=w,Index=w}).Where(p=>String.IsEmpty(p.Word)).ToList()` – Panagiotis Kanavos Jan 24 '23 at 16:13
  • 100 strings is no data at all. It doesn't need parallelism – Panagiotis Kanavos Jan 24 '23 at 16:14
  • Related questions: [Parallel.ForEach Slower than ForEach](https://stackoverflow.com/questions/6036120/parallel-foreach-slower-than-foreach), and also [Should I always use Parallel.Foreach because more threads MUST speed up everything?](https://stackoverflow.com/questions/4172705/should-i-always-use-parallel-foreach-because-more-threads-must-speed-up-everythi) These are the first two Google results for *"C# foreach faster than Parallel"*. – Theodor Zoulias Jan 24 '23 at 16:20

1 Answers1

1

You likely have far to few items for any parallel loop to make any sense at all.

Running anything in parallel will have some overhead, this overhead will likely completely dominate in this case. You want to ensure that each iteration in a parallel loop is sufficiently heavy that the advantages outweigh this overhead.

You also need to ensure that the code is thread safe, because optimizations are irrelevant if the code does not actually work. Using stringBuilder in that way is not safe. I would recommend staying away from any kind of parallel or concurrent programming until you have spent quite a bit of time reading up on how to write thread safe code.

You also need to be careful when writing benchmarks. The gold standard for this is Benchmark .net. This should take care of most problems. If you want to measure performance yourself you should at the very least ensure the following:

  • Use Stopwatch and not DateTime.Now. (not that you do, just for completeness sake)
  • Ensure the runtime is at least in the tens of milliseconds.
  • Do a warm-up pass before measuring, to ensure all code is compiled.
  • Run the code in release, without the debugger attached.

As a final note, use profiling tools to find the actual bottlenecks in your application and only optimize the parts that are actually problematic. And try to prefer high level optimizations (i.e. things like algorithmic improvements or data structures), over micro optimizations. Premature optimizations are the root of all evil:

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgements about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail. After working with such tools for seven years, I've become convinced that all compilers written from now on should be designed to provide all programmers with feedback indicating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.

JonasH
  • 28,608
  • 2
  • 10
  • 23
  • I know that for parallel loop the amount of data is low and its not thread safe but it was just done for testing. I got confused when timespan for foreach is less than for loop. .HasNothing() is just an Extension Method which check if string is NullOrEmptyOrWhiteSpace. Thankyou for informing and advising to use benchmark and to focus on using better algorithm. – silver spark Jan 24 '23 at 17:43
  • @silverspark I cannot see any for-loop in your example so I cannot comment on for vs foreach. But I suspect that the difference comes down to something other than the loop. Maybe cache dependencies? Maybe compile times? Maybe just variance? The first step to avoid this would be to put the method calls inside loops to repeat the calls say a million times each. – JonasH Jan 25 '23 at 07:30
  • Thankyou JonasH for informing that for loop method is absent. I have edited the solution and included for loop – silver spark Jan 30 '23 at 05:43