Problem: We implement a video recording system on a Windows Server 2012 system. In spite of low CPU and memory consumption, we face serious performance problems.
Short program description: the application (VS2005/C++) creates many network sockets, each receiving a multicast UDP video stream from an Ethernet network. Per stream the application provides a receiver buffer by calling WSARecvFrom() (overlapped operation), waits in MsgWaitForMultipleObjects() for the Window's "data arrived" event, takes the data packet, and repeats all again in an endless loop. For testing, to assure minimal CPU and memory consumption beside the pure socket IO work, the application does nothing, neither any disk/file IO. The application process is configured to use all available cores on the machine (default affinity settings unchanged).
Tests run: the test is run on two different machines: a) a Windows 7 with 4 physical cores / 8 with hyper-threading, and b) a Windows Server 2012 with 12 physical cores / 24 with hyper-threading.
Both systems show the same problem: everything works fine up to a certain number of configured sockets / network streams. Increasing them further (and we need to) finally paralyses the Windows desktop (mouse-pointer, repainting). At this stage the total CPU load is still very low (i.e. 10-15%) and there is much free memory available. But the Task-Manager shows extremely one-sided CPU loads: CPU 0 nearly 100%, all other CPUs near to 0%. Changing the Processor Affinity for the process in the Task Manager doesn't help.
Question 1: it looks like CPU 0 is doing the whole kernel's network IO work. Is that likely ?
Question 2: if yes, is there a way to control the kernel's use of available CPUs? If yes, how ?
Question 3: if no, is there any other way to make Windows distribute the (kernel) network IO work to other CPUs (i.e. by installing multiple NIC Cards, each NIC receiving only a subset of the network streams, and bind each NIC to another CPU) ?
Most thankful for any hints from anybody out there.