1

I'm tracking down a performance bottleneck. To narrow down the cause, I created a test using the SimpleTrianglePC sample from the Xbox ATG Samples repo.

I made only a few changes: I changed Draw to DrawInstanced to render multiple triangles, and modified the vertex shader to slightly change the position and color of each triangle based on SV_InstanceID. In total, 265 triangles are rendered, but my GPU hits 100% in the process:

GPU at 100%

Here is the output. Note that I changed the shape of the triangle to a right triangle. The top triangle is black:

Result

The issue appears to be related to the fact that all of the triangles are on top of each other; the problem does not happen if the triangles are rendered without overlapping. I've run tests with thousands of triangles, and as expected, no issue.

To address this, I tried various CommonStates from the DirectXTK; e.g. I set the blend state to Opaque(), the depth stencil state to DepthNone(), and the rasterizer state to CullNone(). There was no difference.

In case anyone is wondering: I am using a hardware device, not a WARP device. My GPU is pretty weak (ASUS EAH 5450), but its not the issue here; I do play games on this machine. It shouldn't be maxing out on 265 triangles.

Here is the modified vertex shader. The default window size of 1280 x 720 is assumed:

struct Vertex
{
    float4 position     : SV_Position;
    float4 color        : COLOR0;
    uint instanceId     : SV_InstanceID;
};

struct Interpolants
{
    float4 position     : SV_Position;
    float4 color        : COLOR0;
};

Interpolants main( Vertex In )
{
    Interpolants Out;
    float2 offset = float( In.instanceId ) * float2( 2.0f / 1280.0f, 2.0f / 720.f );
    Out.position = In.position + float4( offset, 0, 0 );
    Out.color = float4( In.color.xyz * (In.instanceId & 0x1), 1 );
    return Out;
}

Here are the two changes I made to SimpleTrianglePC.cpp. First, creating the vertex buffer:

    // Create vertex buffer.
    float xScalar = 2.0f / 1280.0f;
    float yScalar = 2.0f / 720.0f;
    static const Vertex s_vertexData[3] =
    {
        { { 1.0f * xScalar - 1.0f,   1.0f * yScalar - 1.0f,  0.5f, 1.0f },{ 1.0f, 0.0f, 0.0f, 1.0f } },  // Top / Red
        { { 1.0f * xScalar - 1.0f, 719.0f * yScalar - 1.0f,  0.5f, 1.0f },{ 0.0f, 1.0f, 0.0f, 1.0f } },  // Right / Green
        { { 719.0f * xScalar - 1.0f, 1.0f * yScalar - 1.0f,  0.5f, 1.0f },{ 0.0f, 0.0f, 1.0f, 1.0f } }   // Left / Blue
    };

Second, the draw command:

    // Draw triangle.
    context->DrawInstanced(3, 265, 0, 0);

As I mentioned earlier, I created this test because my original code is slowing down when there are many overlapping primitives. Is this slowdown unavoidable? What should I be doing to avoid this? Thanks in advance.

Edit:

Here's a screenshot of the GPU usage detail:

Detail

Edit 2:

Visual studio's GPU Usage tool appears to be incorrectly reporting the feature level supported by this graphics driver. The call to D3D11CreateDevice returns feature level 11.0. Dxdiag confirms this:

DxDiag info

I analyzed a frame to see if I could get any additional information. It appears that DrawInstance is taking 22 ms to complete:

Frame analysis

Edit 3:

I've been testing for the last few days, but I don't see a good explanation for this behavior. Does anyone see a problem with the example code / my changes, or should I just chalk this up to a buggy / poor performing graphics card?

0 Answers0