4

Is there a way to intersect a single ray with a SIMD-pack of 8 triangles such that I don't have to use store or shuffle or any such slow instructions? My main issue is the final part of the intersection where I find which of the 8 triangles in the pack is nearest to the ray; I'm storing and then getting the minimum t-value, basically a horizontal min.

Moreover, is this pattern correct? I'm using an 8 way BVH with single ray traversal as described in the "Stackless Multi-BVH Traversal for CPU, MIC and GPU Ray Tracing" paper on top of which I've added single ray to bundles of triangles intersection. Would ray-bundle intesected with single-triangle, coupled with binary BVH be more suitable?

Thank you.

  • 1
    That horizontal minimum can be done with some shuffles and vertical minimums, it's a bit annoying but not too bad. If you post the current code, I can see if I can edit it in. – harold Mar 04 '15 at 22:13
  • @harold, the thing is I'm trying to avoid shuffles and stores as I've read they're pretty slow. –  Mar 08 '15 at 14:45
  • They're slower than not doing them, but there is no instruction for horizontal minimum (except for uint16's) so there aren't that many options.. – harold Mar 08 '15 at 15:32

2 Answers2

1

The difficulty with bundles of rays is how do you get high coherency of rays.

With primary rays, its not too bad since neighboring pixels will generate highly coherent rays. With shadow rays, it'll be a little complicated but doable with point lights or relatively small area lights.

Any other secondary rays become a pain to deal with. If you randomly choose rays, then they will easily all hit different triangles meaning you end up doing the equivalent single ray intersections, but with SIMD overhead. From what I've seen using SIMD ray bundles has not yet shown good promise, even when rearranging ray bundles to try to achieve good coherency.

If you do want to try it one possible paper to look at is "Large Ray Packets for Real Time Whitted Ray Tracing". They look at ray tracing using ray packets, and have some references for other ray packet techniques.

illeyezur
  • 451
  • 4
  • 8
  • Read the Q again: it's not about a bundle of rays. It is a single ray versus 8 triangles. Not the other way around. Once the 8 intersections are done, you need to know which one is the closest. – Bram Apr 07 '17 at 03:46
0

You basically have to do this horizontal reduction, using for instance one _mm256_min_ps, one _mm_min_ps, and finally two _mm_min_ss with a few shuffles to put the results in the right lanes. This is already better than doing 8 _mm_min_ss with 8 shuffles.

The other option is to keep the register containing the 8 t parameters and do the reduction only at the end of the traversal loop. But that means loosing the ability to cull BVH nodes precisely. Therefore I would recommend the first approach.

madmann91
  • 91
  • 4
  • If you just need to find the min element of a `ps` vector, see my answer on http://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86, but replace add with min. Neither the question nor either answer has explained how the data is actually organized into vectors, so it's not clear to those of us who know SIMD in general but not this specific problem what's needed. (Ideally someone would edit the question with at least a link to explain how/where you have 8 triangles). – Peter Cordes Jun 17 '16 at 01:05
  • Yes, that's exactly what I was suggesting. Thanks for the link. Usually the triangle data is stored as in Embree, in SoA form. Since the question was explicitly about computing the horizontal min (to get the minimum time of intersection across all 8 triangles) at the end of the intersection, we assume the triangles are already loaded and intersected, so the memory layout of the _triangles_ does not matter. – madmann91 Jun 17 '16 at 09:32