1

I am well aware of Stack Overflow question What are the primitive Forth operators?, but it doesn't really address my question. I am looking not for the minimal but rather practical set of primitives.

Recently I faced a problem which required frequently sorting quite large arrays, and the performance became critical. A naive qsort benchmarked at 20. Porting a heavily (algorithmically) optimized STL version gain me benchmark 16. Native C++ laughed at me from benchmark 3. Oh well.

Finally I bit a bullet and implemented EXCH ( a1 a2 -- a1 a2 ) and non-destructive compares ( n1 n2 -- n1 n2 flag ) as primitives. The results were amazing - three-fold performance gain. Still not C++, but way closer.

Why doesn't standard Forth have them out of the box?

PS: the benchmark is (execution time, nsec)/(n log n)

Community
  • 1
  • 1
user58697
  • 7,808
  • 1
  • 14
  • 28

2 Answers2

2

I suspect that EXCH is not a part of standard Forth simply because it is obscure enough that you are probably better off writing your own if you need it.

I would imagine that non-destructive compares would count as a violation of the general principles of Forth, specifically that words should consume their arguments. If you want to keep the arguments you have to explicitly create a copy.

I don't know enough about implementations to say what sort of performance impact it has, but for most applications

: non-destructive-> 2dup > ;

would make sense and work well enough.

I realise that this is a slightly evasive answer, but I suspect that it is that way because from what I have read the choices behind what words should constitute a standard Forth were not made to optimise execution speed.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
sheepez
  • 986
  • 1
  • 10
  • 26
2

The effect of such changes depend heavily on the quality of your Forth system. Obviously the worse the compiler is, the more effect well-thought out changes will have. On the other hand, it is more difficult to shave off 1 cycle of 4, than 10 cycles of 40. This means that at some point high-level rewrites do not pay off anymore (unless you are a compiler writer :-)

There are of course tricks with multi-threading and special CPU instructions that one might experiment with.

To see where you are, it would be helpful if you could provide actual code and timings on a real system.

brasofilo
  • 25,496
  • 15
  • 91
  • 179
Marcel Hendrix
  • 161
  • 1
  • 5