0

I wrote a simple code that copies array of structs to another array in C#. .NET Core 2.0, Console application, 64 bit executable, Release mode, Windows 10, Intel i7 7700k. Assembly is taken by breaking in Visual Studio and observing Disassembly window.

struct MyStruct
{
    public float F1;
    public float F2;
    public float F3;
    public float F4;
}

class Program
{
    private static MyStruct[] arr1 = new MyStruct[1024];
    private static MyStruct[] arr2 = new MyStruct[1024];

    static void Main(string[] args)
    {
        for (int i = 0; i < arr1.Length; i++)
            arr1[i] = arr2[i];
    }
}

I was expecting this code in assembly to copy src memory to register and then copy to destination array.

In assembly i saw the following (loop boilerplate ommited):

00007FFB33C704DC  vmovdqu     xmm0,xmmword ptr [rdx]  
00007FFB33C704E1  vmovdqu     xmmword ptr [rsp+30h],xmm0  
00007FFB33C704E8  cmp         esi,dword ptr [rax+8]  
00007FFB33C704EB  jae         00007FFB33C7051E  
00007FFB33C704ED  lea         rax,[rax+rcx+10h]  
00007FFB33C704F2  vmovdqu     xmm0,xmmword ptr [rsp+30h]  
00007FFB33C704F9  vmovdqu     xmmword ptr [rax],xmm0  

It copied every struct to stack and only then from stack to destination array.

If i reduce struct size from 128 bit to 64 bit everything becomes fine:

00007FFB33C804D8  vmovss      xmm0,dword ptr [rdx]  
00007FFB33C804DD  vmovss      xmm1,dword ptr [rdx+4]  
00007FFB33C804E3  cmp         esi,dword ptr [rax+8]  
00007FFB33C804E6  jae         00007FFB33C80518  
00007FFB33C804E8  lea         rax,[rax+rcx*8+10h]  
00007FFB33C804ED  vmovss      dword ptr [rax],xmm0  
00007FFB33C804F2  vmovss      dword ptr [rax+4],xmm1  

Why can't it copy 128 bit structure without using stack ?

Grigory
  • 1,911
  • 1
  • 16
  • 29
  • 1
    Using 2 separate scalar loads/stores for the components of a 64-byte struct is not exactly great. A single integer `mov` to/from r8 or some other scratch register would be the best non-vectorized way to copy 64-bit structs. Of course copying 128 bits at a time (2 structs, i.e. auto-vectorized) with xmm loads/stores would be even better. – Peter Cordes Apr 03 '18 at 23:00
  • See [Why is 16 byte the recommended size for struct in C#?](https://stackoverflow.com/questions/2407691/why-is-16-byte-the-recommended-size-for-struct-in-c) – stuartd Apr 03 '18 at 23:00
  • This is from Microsoft's ahead-of-time compiler? I'm not familiar with C# compilers, but yeah copying to/from a local on the stack is pretty bad; more obviously bad (and you'd think easier to optimize away) than copying the members separately. The loop structure looks weird, too. Is there a `jmp` at the bottom? [Most C compilers know to compile loops with a `jcc` at the bottom.](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-like-this). – Peter Cordes Apr 03 '18 at 23:05
  • The JIT can probably recompile on fly when performance-critical code path is being executed, did you break in ahead of first iteration, or did you warm up the JIT a bit by executing this code path few thousand times before? (disclaimer - I don't know anything about C# and MS technologies, only judging by my knowledge of the Java VM and its behaviour and C# is basically same thing as Java in terms of bytecode and VM tooling). BTW, why do you even care, you already did pick C#, so you obviously don't need top performance. The assembly you see is reasonable for managed language. – Ped7g Apr 03 '18 at 23:13
  • @Ped7g .NET JIT does not do Tiered Compilation just yet. It's being worked on but right now JIT only compiles a given method once, so it doesn't matter how many times it's being called. See mattwarren.org/2017/12/15/How-does-.NET-JIT-a-method-and-Tiered-Compilation/ – MarcinJuraszek Apr 03 '18 at 23:15
  • @MarcinJuraszek thanks for fixing my wrong guess and misleading information. And thanks for link, maybe it will help OP, I'm not particularly interested into C#, I'm completely happy with C++ and assembly, and haunted enough by the Java to never touch any more managed/VM stuff than I really can't avoid. – Ped7g Apr 03 '18 at 23:17
  • 2
    Array.Copy(arr1, arr2, arr2.Length) can't be beat. Just file a perf bug at the CoreCLR project. – Hans Passant Apr 04 '18 at 00:13

0 Answers0