I am well aware of Stack Overflow question What are the primitive Forth operators?, but it doesn't really address my question. I am looking not for the minimal but rather practical set of primitives.
Recently I faced a problem which required frequently sorting quite large arrays, and the performance became critical. A naive qsort benchmarked at 20. Porting a heavily (algorithmically) optimized STL version gain me benchmark 16. Native C++ laughed at me from benchmark 3. Oh well.
Finally I bit a bullet and implemented EXCH ( a1 a2 -- a1 a2 )
and non-destructive compares ( n1 n2 -- n1 n2 flag )
as primitives. The results were amazing - three-fold performance gain. Still not C++, but way closer.
Why doesn't standard Forth have them out of the box?
PS: the benchmark is (execution time, nsec)/(n log n)