When running a performance profiler (VS2017), I find that XMVector3Dot
shows up as taking some time (it's part of my code that does collision detection). I find that by replacing the usage of XMVECTOR
with XMFLOAT3
and manually calculating a dot product (the same reasoning applies to other vector operations), that the speed of my algorithm is faster. I understand that XMVECTOR
s are of course needed when suppling the GPU with vectors etc, this is what the GPU understands, but is it expected that when calculating on the CPU that it's faster to manually calculate a dot product with XMFLOAT3
s instead of XMVECTOR
s?

- 1
- 1
1 Answers
Efficient use of SIMD requires a number of techniques, primarily keeping your computation vectorized for as long as you can. If you have to convert back and forth between vectorized and scalar, the performance benefits of SIMD are lost.
Dot-product takes two vectors and returns a scalar value. To make it easier to keep computations vectorized, XMVector3Dot
returns the scalar value 'splatted' across the vector. If you are just extracting one of the components and going back to scalar computations, then your algorithm is likely not well vectorized and you would in fact be better off doing dot product as a scalar operation.
DirectXMath includes a collision header with various tests that follow the SIMD best practices. For example:
inline XMVECTOR PointOnPlaneInsideTriangle(FXMVECTOR P, FXMVECTOR V0, FXMVECTOR V1, GXMVECTOR V2)
{
// Compute the triangle normal.
XMVECTOR N = XMVector3Cross( XMVectorSubtract( V2, V0 ), XMVectorSubtract( V1, V0 ) );
// Compute the cross products of the vector from the base of each edge to
// the point with each edge vector.
XMVECTOR C0 = XMVector3Cross( XMVectorSubtract( P, V0 ), XMVectorSubtract( V1, V0 ) );
XMVECTOR C1 = XMVector3Cross( XMVectorSubtract( P, V1 ), XMVectorSubtract( V2, V1 ) );
XMVECTOR C2 = XMVector3Cross( XMVectorSubtract( P, V2 ), XMVectorSubtract( V0, V2 ) );
// If the cross product points in the same direction as the normal the the
// point is inside the edge (it is zero if is on the edge).
XMVECTOR Zero = XMVectorZero();
XMVECTOR Inside0 = XMVectorGreaterOrEqual( XMVector3Dot( C0, N ), Zero );
XMVECTOR Inside1 = XMVectorGreaterOrEqual( XMVector3Dot( C1, N ), Zero );
XMVECTOR Inside2 = XMVectorGreaterOrEqual( XMVector3Dot( C2, N ), Zero );
// If the point inside all of the edges it is inside.
return XMVectorAndInt( XMVectorAndInt( Inside0, Inside1 ), Inside2 );
}
Instead of doing a scalar conversion an then comparison, it uses vectorized comparisons.
The DirectXMath collision code also avoids dynamic branches. Modern CPUs have a lot of computational power so doing more work without dynamic branches or accessing memory is often faster. For example, here is the sphere-triangle test:
inline bool BoundingSphere::Intersects( FXMVECTOR V0, FXMVECTOR V1, FXMVECTOR V2 ) const
{
// Load the sphere.
XMVECTOR vCenter = XMLoadFloat3( &Center );
XMVECTOR vRadius = XMVectorReplicatePtr( &Radius );
// Compute the plane of the triangle (has to be normalized).
XMVECTOR N = XMVector3Normalize( XMVector3Cross( XMVectorSubtract( V1, V0 ), XMVectorSubtract( V2, V0 ) ) );
// Assert that the triangle is not degenerate.
assert( !XMVector3Equal( N, XMVectorZero() ) );
// Find the nearest feature on the triangle to the sphere.
XMVECTOR Dist = XMVector3Dot( XMVectorSubtract( vCenter, V0 ), N );
// If the center of the sphere is farther from the plane of the triangle than
// the radius of the sphere, then there cannot be an intersection.
XMVECTOR NoIntersection = XMVectorLess( Dist, XMVectorNegate( vRadius ) );
NoIntersection = XMVectorOrInt( NoIntersection, XMVectorGreater( Dist, vRadius ) );
// Project the center of the sphere onto the plane of the triangle.
XMVECTOR Point = XMVectorNegativeMultiplySubtract( N, Dist, vCenter );
// Is it inside all the edges? If so we intersect because the distance
// to the plane is less than the radius.
XMVECTOR Intersection = DirectX::Internal::PointOnPlaneInsideTriangle( Point, V0, V1, V2 );
// Find the nearest point on each edge.
XMVECTOR RadiusSq = XMVectorMultiply( vRadius, vRadius );
// Edge 0,1
Point = DirectX::Internal::PointOnLineSegmentNearestPoint( V0, V1, vCenter );
// If the distance to the center of the sphere to the point is less than
// the radius of the sphere then it must intersect.
Intersection = XMVectorOrInt( Intersection, XMVectorLessOrEqual( XMVector3LengthSq( XMVectorSubtract( vCenter, Point ) ), RadiusSq ) );
// Edge 1,2
Point = DirectX::Internal::PointOnLineSegmentNearestPoint( V1, V2, vCenter );
// If the distance to the center of the sphere to the point is less than
// the radius of the sphere then it must intersect.
Intersection = XMVectorOrInt( Intersection, XMVectorLessOrEqual( XMVector3LengthSq( XMVectorSubtract( vCenter, Point ) ), RadiusSq ) );
// Edge 2,0
Point = DirectX::Internal::PointOnLineSegmentNearestPoint( V2, V0, vCenter );
// If the distance to the center of the sphere to the point is less than
// the radius of the sphere then it must intersect.
Intersection = XMVectorOrInt( Intersection, XMVectorLessOrEqual( XMVector3LengthSq( XMVectorSubtract( vCenter, Point ) ), RadiusSq ) );
return XMVector4EqualInt( XMVectorAndCInt( Intersection, NoIntersection ), XMVectorTrueInt() );
}
For your algorithm, you should either (a) make it fully vectorized or (b) stick with a scalar dot-product.

- 38,259
- 2
- 58
- 81
-
Thanks for your response! – Picaro1981 Jun 16 '17 at 08:52