43

I'm interested in implementing a Forth system, just so I can get some experience building a simple VM and runtime.

When starting in Forth, one typically learns about the stack and its operators (DROP, DUP, SWAP, etc.) first, so it's natural to think of these as being among the primitive operators. But they're not. Each of them can be broken down into operators that directly manipulate memory and the stack pointers. Later one learns about store (!) and fetch (@) which can be used to implement DUP, SWAP, and so forth (ha!).

So what are the primitive operators? Which ones must be implemented directly in the runtime environment from which all others can be built? I'm not interested in high-performance; I want something that I (and others) can learn from. Operator optimization can come later.

(Yes, I'm aware that I can start with a Turing machine and go from there. That's a bit extreme.)

Edit: What I'm aiming for is akin to bootstrapping an operating system or a new compiler. What do I need do implement, at minimum, so that I can construct the rest of the system out of those primitive building blocks? I won't implement this on bare hardware; as an educational exercise, I'd write my own minimal VM.

Seki
  • 11,135
  • 7
  • 46
  • 70
Barry Brown
  • 20,233
  • 15
  • 69
  • 105
  • There are lots of FORTH interpreters around, you might want to check out e.g. [Building FORTH](git@github.com:Immortalin/Building_Forth.git) and [lbForth](https://github.com/larsbrinkhoff/forth-metacompiler) or take a peek at Jones FORTH ([32 bit](git://git.annexia.org/git/jonesforth.git) or [64 bit](http://subvert-the-dominant-paradigm.net/repos/jonesforth64)) – vonbrand Aug 16 '19 at 17:34

8 Answers8

25

This thread covers your exact question. Here is a soup-to-nuts implementation with complete documentation.

I wrote a subroutine threaded Forth targeting 68K when I was in college. I defined the runtime environment and dictionary format, then wrote some C code that boot strapped a Macintosh application that loaded a default dictionary, populated some I/O vectors and got the code running. Then I took the Leo Brodie book Starting Forth and started implementing the basic dictionary in 68K assembly language. I started with arithmetic/logic words, then did control structures then word definition/manipulation words. My understanding is that at a minimum you need @, !, +, -, * and /. The rest can be implemented in terms of those, but that's like trying to write an entire graphics library based on SetPixel and GetPixel: it will work, but yikes, why?

I enjoyed the process as there were some really interesting puzzles, like getting DOES> exactly right (and once I had a solid DOES> implementation, I was creating closures that turned into tiny, tiny amounts of code).

Milo Fultz
  • 17
  • 1
  • 5
plinth
  • 48,267
  • 11
  • 78
  • 120
13

A long time ago, I had a book called "Threaded Interpretive Languages", published I think by Byte, that discussed how to implement a Forth-like language (I don't think they ever called it Forth) in Z80 assembly.

You may not have a Z80 handy, or want one, but the book might be instructive.

David Thornley
  • 56,304
  • 9
  • 91
  • 158
8

This post at comp.lang.forth lists a few "minimal Forths".

http://groups.google.com/group/comp.lang.forth/msg/10872cb68edcb526

Why do I know this? My brother, Mikael, wrote #3 and he also wrote a paper about making a "minimal Forth" (in Swedish, though). If I remember correctly he wanted to get a minimal set of operators that could be built in silicon.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
epatel
  • 45,805
  • 17
  • 110
  • 144
4

I'm still not convinced the question is well-formed. For example, Plinth's instructions can be reduced; after all, * and / can be implemented in terms of + and -, but then '+' can be implemented in terms of a successor function (see the Peano axioms.) Which puts you into the neighborhood of a Turing machine. How do you know where to stop?

Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • 1
    I'd stop where it's educationally-instructive to do so. For example, I can certainly see implementing multiplication in terms of addition, but implementing addition as a primitive. Then we have incentive to weigh the pros and cons of implementing '*' and others as primitives. – Barry Brown Jan 03 '09 at 04:31
  • That's a different question, at least to a math geek. "Primitive" means "minimal, least, atomic" in this context. – Charlie Martin Apr 10 '09 at 15:26
  • 2
    You stop when the cost of implementing it as a primitive is higher than the cost of implementing it as a forth word. For the * example, it's certainly more burdensome to implement it in forth than using the CPU's native capability (assuming the CPU has a multiply instruction). – zvrba Mar 31 '11 at 05:25
4

You might also want to take a look at Hans Bezemer's 4tH compiler.

bugmagnet
  • 7,631
  • 8
  • 69
  • 131
3

Which Forth implementation are you using that doesn't provide this information in the documentation? Given the nature of Forth, it might be implementation-dependent. There's a standard set of words in the dictionary, but whether they got there by assembly/C/whatever or by Forth shouldn't matter, since Forth is by definition a self-extensible language.

dkretz
  • 37,399
  • 13
  • 80
  • 138
  • I've looked at pforth, which predefines hundreds of operators in its C source. – Barry Brown Jan 02 '09 at 21:07
  • By definition, the instruction set of your processor must be sufficient, so just expose that; because that's what all compilers build from. And even that instruction set is probably not entirely be necessary, because some instructions can be expressed in terms of the others. – dkretz Jan 03 '09 at 01:29
1

Contrary to what you say, generally DROP SWAP etc are considered basic Forth operations. The reason is that if you implement them using memory operations like you suggest, the overall system becomes more, not less complicated. Also there is no clear distinction in Forth between what is basic and what not. In the 80's a dictionary search would be basic, and coded in assembler for speed, while a modern linux hosted can afford to code that in so called high level. Also Forthers tend to routinely recode assembler words in high level, and high level words in assembler. I'm the author of ciforth and yourforth. It is possible to define <= as "> not" as I did in ciforth . But in yourforth I decided that having all of < <= > >= as similar, uniformly looking, small assembler routines was effectively simpler. That is a judgement call, and a matter of taste, certainly not a matter of principle.

In the context I interpret the question as: "What is a reasonable size for the number of primitive operations to arrive at a reasonable powerful Forth with a reasonable speed?" Clearly you are not interested in clever tricks to get rid of one assembler word at the expense of tremendous overhead as found in some of the threads discussing this subject.

Now you can look at a number of small Forth's like jonesforth yourforth eforth and conclude that mostly one arrives at around 50 to 100 primitives. Those Forth's are defined in assembler. If you want to define your primitives in c, python or Java , the situation is again different. Now for e.g. the above dictionary search you have a choice between c and Forth. Considerations that have nothing to do with language design come into play. You may be a prolific c-programmer, or you may insist on coding it in Forth because it is a learning project.

1
  1. One of my favorites is the three-instruction forth of the MSDOS Pygmy Forth by Frank Sergeant. He uses a tethered Forth, I believe, a more fully functioning forth on a PC, a serial link to the target, and peek, poke, execute (basic language terminology) on the target, i.e., read, write, and run.

  2. If you want the most up-to-date, technologically advanced answer, take a look at the 5-bit (32) instruction set forth (page 5 of the PDF, figure 3) in the 144-core forth cpu developed by Charles Moore the father of Forth. Basically, Mr. Moore gave us Forth, allowed us to make a Fork of what he then had, but he continued optimizing it for the rest of his life until now, eventually crystallizing it down to the cpu level (also making a VLSI chip CAD design tool to design his own chips, also designed bottom-to-top in his own ColorForth. That isn't a low-level language, or a high-level language -- that's an omni-level-language!)

  3. The Factor Programming language I consider to be very similar, and has at its core a virtual machine coded in the C (or c++) language

  4. Finally, there is a public domain Forth called pForth, which has its kernel written in C.