3

I had a short look at the Forth programming language for a while. Is it possible to do multithreading with synchronization primitives in Forth?

For example, is it possible to do n-by-n matrix multiplication with multiple threads in Forth? If so, what is the basic mechanism, or programming patterns?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131

3 Answers3

3

For the stated goal the multi-threading has to be pre-emptive. Simple Forths have a PAUSE-ing task-loop that runs tasks one after the other, never overlapping. Surprisingly useful but not in this case.

Modern, professional, Forth can do multi-threading but I know of only one with special primitives to make it easier.

The example matrix multiplication given earlier is not an demonstration of multi-threading.

To my knowledge (*), only the iForth compiler has special multi-threading primitives (OCCAM based), and comes with examples that really run x-times faster on n-core processors (where x < n). For the matrix code I would use its PAR .. ENDPAR where the threads access rows and colums that stay far apart in memory, to prevent cache pollution. There is another primitive that automatically splits up DO-LOOPs for you, in the way needed for this task. An example of this syntax for 8 threads is:

0 VALUE jj 

: mmul2 ( F: -- r )
    a3 /size DFLOATS ERASE
    /rsz 0 DO  
           I TO jj
           PAR
             STARTP  /rsz 0 DO  a1 jj     /rsz * I + DFLOAT[] DF@   a2 I /rsz * DFLOAT[]   a3 jj     /rsz * DFLOAT[]  /rsz DAXPY_sse2   LOOP ENDP 
             STARTP  /rsz 0 DO  a1 jj 1+  /rsz * I + DFLOAT[] DF@   a2 I /rsz * DFLOAT[]   a3 jj 1+  /rsz * DFLOAT[]  /rsz DAXPY_sse2   LOOP ENDP 
             STARTP  /rsz 0 DO  a1 jj 2+  /rsz * I + DFLOAT[] DF@   a2 I /rsz * DFLOAT[]   a3 jj 2+  /rsz * DFLOAT[]  /rsz DAXPY_sse2   LOOP ENDP 
             STARTP  /rsz 0 DO  a1 jj 3 + /rsz * I + DFLOAT[] DF@   a2 I /rsz * DFLOAT[]   a3 jj 3 + /rsz * DFLOAT[]  /rsz DAXPY_sse2   LOOP ENDP
             STARTP  /rsz 0 DO  a1 jj 4 + /rsz * I + DFLOAT[] DF@   a2 I /rsz * DFLOAT[]   a3 jj 4 + /rsz * DFLOAT[]  /rsz DAXPY_sse2   LOOP ENDP
             STARTP  /rsz 0 DO  a1 jj 5 + /rsz * I + DFLOAT[] DF@   a2 I /rsz * DFLOAT[]   a3 jj 5 + /rsz * DFLOAT[]  /rsz DAXPY_sse2   LOOP ENDP
             STARTP  /rsz 0 DO  a1 jj 6 + /rsz * I + DFLOAT[] DF@   a2 I /rsz * DFLOAT[]   a3 jj 6 + /rsz * DFLOAT[]  /rsz DAXPY_sse2   LOOP ENDP
             STARTP  /rsz 0 DO  a1 jj 7 + /rsz * I + DFLOAT[] DF@   a2 I /rsz * DFLOAT[]   a3 jj 7 + /rsz * DFLOAT[]  /rsz DAXPY_sse2   LOOP ENDP
           ENDPAR
      8 +LOOP 
    0e  a3 /size 0 ?DO  DF@+ F+  LOOP DROP ;

For 1024 x 1024 matrices this (mmul2) is about twice faster than the single-thread version (mmul1).

FORTH> TESTS
DOT/AXPY using 64 bits floats.
Vector size = 1048576
mul0 (dot)         :  6.8719411200000000000e+0013 0.133 seconds elapsed.
mul1 (dot_sse2)    :  6.8719411200000000000e+0013 0.106 seconds elapsed.
mmul0 (axpy)       :  5.6294941655040000004e+0014 0.981 seconds elapsed.
mmul1 (axpy_sse2)  :  5.6294941655040000004e+0014 0.400 seconds elapsed.
mmul2 (Paxpy_sse2) :  5.6294941655040000004e+0014 0.114 seconds elapsed. ok

(*) Rumor has it that MPE and Forth Inc recently added similar functionality.

Marcel Hendrix
  • 161
  • 1
  • 5
0

Any Forth that can do multitasking can do multithreading as well. (They're the same thing within an application.) Almost all Forths can do it now.

You can do something like:

include fsl-util.f

 3 3 float matrix A{{
 A{{ 3 3 }}fread  1e 2e 3e  4e 5e 6e  7e 8e 9e
 3 3 float matrix B{{
 B{{ 3 3 }}fread  3e 3e 3e  2e 2e 2e  1e 1e 1e
 3 3 float matrix C{{    \ result

 A{{ B{{ C{{ mat*
 C{{ }}print
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Josip Ivic
  • 3,639
  • 9
  • 39
  • 57
  • i made an edit to put you an example code. Hope that helps, I can not think of any other way. – Josip Ivic Sep 23 '15 at 10:20
  • regarding `fsl-utilf.f` -- Forth Scientific Library doesn't use multi-threading for matrix multiplication, and doesn't support any parallelism at all since it is general library that is based on the standard features only. – ruvim Aug 09 '17 at 09:22
0

For now, Forth standard doesn't specify any multithreading or multitask related words. Although, many historic Forth implementations have such primitives, or allow to define them using Forth-assembler or API to underlying system.

As example, synchronization primitives and multithreading in SP-Forth/4 are mostly just generic wrappers over Windows and Linux (pthreads) APIs.

Note that a threads pool should be used to have better performance for small operations — since creating/destroying thread could be time-consuming operation.

Also it is possible that implementation of n-by-n matrix multiplication can get better gain from using SSE operations, or even GPU (see gpu.js for example).

In any way, the solution depends on particular Forth system.

Example (conceptual model)

Using matrices and thread-pool libraries, matrix multiplication could look like the following:

\ matrices vocabulary is in the context.

slot-enum{ m1 m2 m3 tp }slot-enum

: calc-item { r c -- }
  0e  m1 columns 0 do
    r i m1 item
    i c m2 item
    F* F+
  loop  r c m3 item!
;
: mult-matrix ( a b c -- ) \ c = a * b 
  m3! m2! m1!
  \ m3 dimenisions should be m1 rows x m2 columns 
  threadpool::new-group tp!
  m1 rows 0 do m2 columns 0 do
    i j 2 'calc-item tp threadpool::run
  loop
  tp threadpool::join
  tp threadpool::free
;
ruvim
  • 7,151
  • 2
  • 27
  • 36