2

Preamble

Alright so ... I know this question may cover several topics, but I'm completely new to DirectX as well as Multithreading and the Stackoverflow+MSDN articles I read up until now have not helped me at all. Hence I'm very thankful for every comment that pushes me in the right direction.

Premise

I started a few weeks ago by writing a Direct2D renderer which draws some matrix I put into it and draws it in a single window (Which works great, by the way).

I was trying to speed up my computations and got the tip to use openMP. When using the pragma statement, my progrman uses 3 threads instead of one - which is good I guess. But I do not notice any speedup. However that's not the whorst part. The drawing call takes up a lot more time than my computation of the matrix. And I have no idea how I could speed that up.

Question

Please tell me what I should look out for or how I could speed up/multithread my drawing call.

Note: I'm using STL, Windows and DirectX headers but no .NET, MFC/ATL or similar library.

Code sample

vector<dot> set computeMatrix(ushort x, ushort y)
{
   // init set
   #pragma omp parallel for
   for(i=0; i<y; ++i)
     for(j=0; j<x; ++j)
        //do some computation
   return set;
}

dot is a D2D1 ellipse object.

void draw(vector<dot> set)
{
  pRenderTarget->BeginDraw();
  pRenderTarget->SetTransform(D2D1::Matrix3x2F::Identity());
  #pragma omp parallel for
  for(auto coord: set)
  {
    // set the pBrush
    pRenderTarget->FillEllipse(dot, pBrush);
  }
  pRenderTarget->EndDraw();
}
Community
  • 1
  • 1
0x492559F64E
  • 124
  • 1
  • 13
  • Some resources I've found and already read up on, but didn't help yet to answer my questions: [SO: Best practices for multithreading in D2D](http://stackoverflow.com/questions/6242729/what-are-the-best-practices-for-multithreading-with-direct2d-dxgi-d3d-interop), [SO: Multithreading in D2D](http://stackoverflow.com/questions/23195425/multithreading-in-direct2d), [BatchBatchBatch](http://www.nvidia.com/docs/IO/8228/BatchBatchBatch.pdf) as well as the MSDN articles linked in those StackOverflow threads. – 0x492559F64E Jun 26 '14 at 12:01

1 Answers1

3

If you are using Win8 or above, ensure that you have set the D2D1_DEVICE_CONTEXT_OPTIONS_ENABLE_MULTITHREADED_OPTIMIZATIONS option when creating the Direct2D device context.

When this flag is specified, Direct2D will distribute rendering across all of the logical cores present on the system, which can significantly decrease overall rendering time.

Currently, this flag is only for geometry, and requires HAL.

As of Windows 8.1, this flag only affects path geometry rendering. It has no impact on scenes containing only other primitive types (such as text, bitmaps, or geometry realizations).

This flag also has no impact when rendering in software (i.e. when rendering with a WARP Direct3D device). To control software multithreading, callers should use the D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS flag when creating the WARP Direct3D device.

Specifying this flag can increase peak working set during rendering and can also increase thread contention in applications that already take advantage of multithreaded processing.

You may also want to consider geometry realizations. I have found this to make a noticeable difference when there are a large number of geometry objects to render, however you will have to profile that for your scenario. Ellipse is a simple geometry, so you may not see a noticeable gain, especially if you have to translate to the render location.

OpenMP (or any other GPU api) can offer much worse performance on simple tasks, due to the overhead of having to create a 'view' and copy the data to and from the view. Ensure that you specific task benefits from using OpenMP, and ensure your profiling includes the above mentioned steps.

Remember there is a good deal of overhead with the creation and scheduling of threads. Often times you will see better performance by keeping you code as simple as possible, and not dealing with the scheduling and synchronization. Nevertheless, carefully planned threads can provide huge gains, especially when they don't manipulate the same resources (or data).

Take a look at your render pass, and try to determine if any of your code is having to re-calculate anything that hasn't changed (sizes, location, etc). Perform these tasks when responding to a window size change. Pay attention to your structure alignment (memory boundaries), and locality of data (cache-misses are expensive). Hope this helps.

Jeff
  • 2,495
  • 18
  • 38