0

Why would I be getting such poor performance from the code below?

The following command line uses 16 threads, with a load of 60. On my machine this takes approximately 31 seconds to finish (with some slight variations if you rerun)

testapp.exe 16 60

Using a load of 60, on Microsoft Windows Server 2008 R2 Enterprise SP1, running on 16 Intel Xeon E5-2670 @ 2.6 GHz CPUs I get the following performance:

1 cpu - 305 seconds

2 cpus - 155 seconds

4 cpus - 80 seconds

8 cpus - 45 seconds

10 cpus - 41 seconds

12 cpus - 37 seconds

14 cpus - 34 seconds

16 cpus - 31 seconds

18 cpus - 27 seconds

20 cpus - 24 seconds

22 cpus - 23 seconds

24 cpus - 21 seconds

26 cpus - 20 seconds

28 cpus - 19 seconds

After this it flat-lines ...

I get approximately the same performance using .Net 3.5, 4, 4.5 or 4.5.1.

I understand the drop-off in performance after 22 cpus, as I only have 16 on the box. What I don't understand is the poor performance after 8 cpus. Can anyone explain? Is this normal?

private static void Main(string[] args)
{
    int threadCount;
    if (args == null || args.Length < 1 || !int.TryParse(args[0], out threadCount))
        threadCount = Environment.ProcessorCount;

    int load;
    if (args == null || args.Length < 2 || !int.TryParse(args[1], out load))
        load = 1;

    Console.WriteLine("ThreadCount:{0} Load:{1}", threadCount, load);

    List<Thread> threads = new List<Thread>();

    for (int i = 0; i < threadCount; i++)
    {
        int i1 = i;
        threads.Add(new Thread(() => DoWork(i1, threadCount, load)));
    }

    Stopwatch timer = Stopwatch.StartNew();

    foreach (var thread in threads)
    {
        thread.Start();
    }

    foreach (var thread in threads)
    {
        thread.Join();
    }

    timer.Stop();

    Console.WriteLine("Time:{0} seconds", timer.ElapsedMilliseconds/1000.0);
}

static void DoWork(int seed, int threadCount, int load)
{
    double[,] mtx = new double[3,3];

    for (int i = 0; i < ((100000 * load)/threadCount); i++)
    {
        for (int j = 0; j < 100; j++)
        {
            mtx = new double[3,3];

            for (int k = 0; k < 3; k++)
            {
                for (int l = 0; l < 3; l++)
                {
                    mtx[k, l] = Math.Sin(j + (k*3) + l + seed);
                }
            }
        }
    }
}
Cronan
  • 183
  • 1
  • 10
  • Note that if you compare like for like, and look at 1, 2, 4, 8, 16 - ie miss out the relatively smaller 10, 12, 14 steps, there's still a relatively "big" drop from 45 -> 31. – James Thorpe Sep 11 '15 at 16:05
  • 4
    I'm not sure that you're benchmarking actual computations there. It seems like what you're really benchmarking is concurrent heap allocations. – Theodoros Chatzigiannakis Sep 11 '15 at 16:06
  • 1
    How much time is spent in GC? Are you using the client or the server GC? – CodesInChaos Sep 11 '15 at 16:07
  • I agree with @TheodorosChatzigiannakis, remove the `mtx = new double[3,3]` and see what happens. – Richard Schneider Sep 11 '15 at 16:09
  • What is it that the threads are doing? I mean what is the code that the threads are running? – displayName Sep 11 '15 at 16:09
  • 1
    @displayName It's in the code sample, scroll down. – xxbbcc Sep 11 '15 at 16:10
  • 3
    I would recommend two (alternative) changes to this experiment: (1) preallocate some `new double[,]` arrays in the starting thread, pass each one to each child thread and then reuse it instead of reallocating it in the loop or (2) `stackalloc` a `double[3 * 3]` in the loops and use that. Otherwise, you may be accidentally benchmarking the performance of the memory allocator or the garbage collector under rapid allocations, instead of your code per se. – Theodoros Chatzigiannakis Sep 11 '15 at 16:18
  • I agree, you're mostly testing garbage collection here. Since you are testing a command line app, I'm pretty sure there is just one garbage collection thread and when GC runs, all other threads are stopped. – Christoph Sep 11 '15 at 16:22
  • In addition to all the other comments, this is a release build, right? – xxbbcc Sep 11 '15 at 16:26
  • How much memory does your CPU have? – displayName Sep 11 '15 at 17:36
  • Yes, release build, the machine has 12Gb RAM – Cronan Sep 11 '15 at 18:06
  • I'll try taking the mtx out - it's there because I was doing some work earlier that took more time that the Sin – Cronan Sep 11 '15 at 18:11
  • 1) Threads count != cores. Your app is not the only one (and jobs have different difficulty), OS thread management very hard. 2) ThreadPool should work better. [CLR 4.0 ThreadPool Improvements](http://blogs.msdn.com/b/ericeil/archive/2009/04/23/clr-4-0-threadpool-improvements-part-1.aspx) 3) Good to read "Pro .NET Performance: Optimize Your C# Applications" – MrDywar Sep 11 '15 at 18:50

2 Answers2

1

Please refer to the Intel ARK for the XEON E5-2670

This particular processor has 8 physical cores which are hyper-threaded. This is why you see a performance drop after 8 threads. Calling Environment.ProcessorCount gets 16 logical cores (2 logical cores per physical core because they are hyperthreaded).

A similar question has been answered on SuperUser.

You can try to set the affinity of the threads see if it makes a difference, but the scheduler usually does a good job of allocating resource.

Hope this helps.

Community
  • 1
  • 1
Luis Ramirez
  • 414
  • 3
  • 7
  • The machine has 32 logical processors, not 16 – Cronan Sep 11 '15 at 18:14
  • @Cronan: Can you provide any link which has the specifications of your CPU? I am seeing another link on Amazon: http://www.amazon.com/Intel-E5-2670-2-60Ghz-8-Core-Processor/dp/B007H29FRS and it says that core count is 8 and not 16. Or use this: http://superuser.com/questions/226552/how-to-tell-how-many-cpus-cores-you-have-on-windows-7 and let me know what core count your CPU says it has. – displayName Sep 13 '15 at 02:17
1

It is not that the threads that causes the performance to go down. But it is the "creation" of the thread itself.

Instead of creating a brand new thread, you need to borrow an already created thread form the OS thread pool. Use ThreadPool class instead of using new Thread()

Yazan Ati
  • 62
  • 2
  • I'm not creating new threads in a fast loop, I create the "worker" threads I want to use ahead of time, but thank you for answering – Cronan Sep 11 '15 at 18:08