3

I need to calcuate the normals of some triangles where I have a vector of vertices where each vertex has x, y, z coordinates. i1, i2, i3 are the indices in the vector of the three vertices of a triangle.

I'm using <DirectXMath.h> and wrote this which seems to work.

XMFLOAT3 normal;
///

XMVECTOR v1 = XMLoadFloat3(&XMFLOAT3(verts[i1].x, verts[i1].y, verts[i1].z));
XMVECTOR v2 = XMLoadFloat3(&XMFLOAT3(verts[i2].x, verts[i2].y, verts[i2].z));
XMVECTOR v3 = XMLoadFloat3(&XMFLOAT3(verts[i3].x, verts[i3].y, verts[i3].z));
XMVECTOR n  = XMVector3Cross(XMVectorSubtract(v2 ,v1), XMVectorSubtract(v3 ,v1));
XMStoreFloat3(&normal, n);

However it appears to have more Loads and Stores than actual calculations and was wondering if there was a better way to actually do this? Or are the load and stores "cheap" operations?

I have to run this for every triangle and it's taking a large amount of time relative to the rest of my code so speed improvements would be welcome.

Jason R
  • 11,159
  • 6
  • 50
  • 81
jcoder
  • 29,554
  • 19
  • 87
  • 130
  • With respect to your question about loads and stores being cheap, it's actually quite the opposite. If your algorithm has a high proportion of memory accesses relative to arithmetic operations, then performance will suffer. The amount of bandwidth to and from system memory is quite limited compared to the pure number of computations that a CPU can perform when operating on suitably well-structured code. – Jason R Nov 17 '12 at 01:59
  • So loading and storing all these vectors into sse for one calculation is maybe not worth it? I guess I'll make a cpu version and profile it. – jcoder Nov 17 '12 at 08:48
  • You may find ``XMVectorSet`` a better option than your ``XMVECTOR v1 = XMLoadFloat3(&XMFLOAT3(verts[i1].x, verts[i1].y, verts[i1].z)); `` pattern, although a better option is keeping your verts in a form that you can directly use XMLoadFloat3 on them. – Chuck Walbourn Dec 10 '14 at 21:32

1 Answers1

2

Try adding #define _XM_NO_INTRINSICS_ prior to #include <DirectXMath.h>. This will disable the use of SSE within the library, allowing the compiler more freedom to make its own optimizations.

MooseBoys
  • 6,641
  • 1
  • 19
  • 43