I'm tracking down a performance bottleneck. To narrow down the cause, I created a test using the SimpleTrianglePC sample from the Xbox ATG Samples repo.
I made only a few changes: I changed Draw
to DrawInstanced
to render multiple triangles, and modified the vertex shader to slightly change the position and color of each triangle based on SV_InstanceID. In total, 265 triangles are rendered, but my GPU hits 100% in the process:
Here is the output. Note that I changed the shape of the triangle to a right triangle. The top triangle is black:
The issue appears to be related to the fact that all of the triangles are on top of each other; the problem does not happen if the triangles are rendered without overlapping. I've run tests with thousands of triangles, and as expected, no issue.
To address this, I tried various CommonStates from the DirectXTK; e.g. I set the blend state to Opaque(), the depth stencil state to DepthNone(), and the rasterizer state to CullNone(). There was no difference.
In case anyone is wondering: I am using a hardware device, not a WARP device. My GPU is pretty weak (ASUS EAH 5450), but its not the issue here; I do play games on this machine. It shouldn't be maxing out on 265 triangles.
Here is the modified vertex shader. The default window size of 1280 x 720 is assumed:
struct Vertex
{
float4 position : SV_Position;
float4 color : COLOR0;
uint instanceId : SV_InstanceID;
};
struct Interpolants
{
float4 position : SV_Position;
float4 color : COLOR0;
};
Interpolants main( Vertex In )
{
Interpolants Out;
float2 offset = float( In.instanceId ) * float2( 2.0f / 1280.0f, 2.0f / 720.f );
Out.position = In.position + float4( offset, 0, 0 );
Out.color = float4( In.color.xyz * (In.instanceId & 0x1), 1 );
return Out;
}
Here are the two changes I made to SimpleTrianglePC.cpp. First, creating the vertex buffer:
// Create vertex buffer.
float xScalar = 2.0f / 1280.0f;
float yScalar = 2.0f / 720.0f;
static const Vertex s_vertexData[3] =
{
{ { 1.0f * xScalar - 1.0f, 1.0f * yScalar - 1.0f, 0.5f, 1.0f },{ 1.0f, 0.0f, 0.0f, 1.0f } }, // Top / Red
{ { 1.0f * xScalar - 1.0f, 719.0f * yScalar - 1.0f, 0.5f, 1.0f },{ 0.0f, 1.0f, 0.0f, 1.0f } }, // Right / Green
{ { 719.0f * xScalar - 1.0f, 1.0f * yScalar - 1.0f, 0.5f, 1.0f },{ 0.0f, 0.0f, 1.0f, 1.0f } } // Left / Blue
};
Second, the draw command:
// Draw triangle.
context->DrawInstanced(3, 265, 0, 0);
As I mentioned earlier, I created this test because my original code is slowing down when there are many overlapping primitives. Is this slowdown unavoidable? What should I be doing to avoid this? Thanks in advance.
Edit:
Here's a screenshot of the GPU usage detail:
Edit 2:
Visual studio's GPU Usage tool appears to be incorrectly reporting the feature level supported by this graphics driver. The call to D3D11CreateDevice
returns feature level 11.0. Dxdiag confirms this:
I analyzed a frame to see if I could get any additional information. It appears that DrawInstance is taking 22 ms to complete:
Edit 3:
I've been testing for the last few days, but I don't see a good explanation for this behavior. Does anyone see a problem with the example code / my changes, or should I just chalk this up to a buggy / poor performing graphics card?