How to create a language these days?

Question

I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?

Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?

For record, us 'kids' still use the command line and quibble over which shell is superior. Or I do atleast. C is dead though. I must now flee the hoarde of C programmers, so I'll see you around! — Matthew Scharley, Oct 11 '09 at 06:43
interpreted or compiled? hmmm good question. I'll assume it makes a difference, so I'll say both just to be on the safe side. — Mike, Oct 11 '09 at 06:47
@Kinopiko All successful interpreted languages will eventually be compiled for speed. Might as well assume it'll be compiled, just so you don't make that unnecessarily difficult. @Matthew C dead? Since when? — Kevin Montrose, Oct 11 '09 at 06:50
@Matthew : Do not anger C programmers. Or they'll unleash their horde of SEGFAULTs and SIGFPEs on you. — aviraldg, Oct 11 '09 at 06:54
@Kevin so Perl, Python, Ruby, and PHP are all failures? OK then! — , Oct 11 '09 at 06:58
@kinopiko: most interpreted languages are compiled to native code on the fly just before execution rather than literally interpreted line by line. @Kevin: Just some humour. C isn't (quite) dead. Just like COBOL isn't. — Matthew Scharley, Oct 11 '09 at 07:10
@Matthew which of Perl, Python, Ruby or PHP is compiled to native code on the fly just before execution? And does the fly know that he is going to be executed? — , Oct 11 '09 at 07:17
Perl at the very least is. PHP can be with various extensions that go by the name of 'optimiser's. — Matthew Scharley, Oct 11 '09 at 07:17
@Matthew - Can you provide a source for Perl being JIT compiled? I've never heard anything about this in all my Perl experience. It'd be rather difficult to do, since Perl has such weak typing. Also, C won't be dead until there's a better, faster language to write compilers/interpreters in. — Chris Lutz, Oct 11 '09 at 07:40
C is used all over the place is embedded systems, not to mention kernels, drivers, etc... — GManNickG, Oct 11 '09 at 08:00
@Chris: Google for "perl jit parrot" and you'll get a tonne of them. — Matthew Scharley, Oct 11 '09 at 08:10
Iron Python (compiled), Zend [PHP] (compiled), Iron Ruby [upcoming] (compiled). I don't know about Perl; perhaps Perl 6 on Parrot counts? Anyway, my point was not that all interpreted languages are turned into purely compiled ones. Rather, that all interpreted languages will be compiled at some point in the future; either in alternative implementations, or in an option on the main branch. — Kevin Montrose, Oct 11 '09 at 08:11
Or just leave off parrot, since it's a bit misleading (and only applicable to Perl 6). But the short version is that there's plenty of evidence there. — Matthew Scharley, Oct 11 '09 at 08:15
IronPython is slightly different from regular Python, IronRuby is still somewhat experimental, and Perl 6 is still a ways off (for some reason). — Chris Lutz, Oct 11 '09 at 08:49
There are several extant compiler-compiler questions. But they are a pain to search for... — dmckee --- ex-moderator kitten, Oct 11 '09 at 22:45
Actually there is a [compiler-compiler] tag http://stackoverflow.com/questions/tagged/compiler-compiler , it just doesn't have many entries... — dmckee --- ex-moderator kitten, Oct 11 '09 at 22:48
possible duplicate of [Creating your own language](http://stackoverflow.com/questions/365602/creating-your-own-language) — nawfal, May 18 '13 at 20:46

score 7 · Answer 1 · answered Oct 11 '09 at 08:38

One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).

On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.

If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.

Not true! since the libraries for the JVM and .NET platforms don't have to worry about idiosyncrasies of the plaforms they're on, they can be free to explore API design aspects that would otherwise be left untouched. — RCIX, Oct 16 '09 at 02:01

David Crawshaw · Answer 2 · 2009-10-11T09:20:02.337

I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.

If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.

You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.

peterchen · Answer 3 · 2010-09-16T07:29:58.567

What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:

Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.

Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.

Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.

"the old 'compile .x -> .y' executable is pretty much dead" - hahahaha! Tell me another one. Go on that was great! — alex tingle, Oct 11 '09 at 15:49
alex: as in "all you need for a new language is...", I guess I should clarify that. — peterchen, Oct 11 '09 at 17:13

score 2 · Answer 4 · answered Oct 11 '09 at 06:39

2

I'm pretty sure you do what's always been done.

Write some code, and show your results to the world.

As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?

answered Oct 11 '09 at 06:39

Kevin Montrose

22,191
9
88
137

score 2 · Answer 5 · 2009-10-11T08:45:48.893

2

You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.

Also, what is the proposed name of the language?

I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.

edited Oct 11 '09 at 08:45

answered Oct 11 '09 at 07:10

1

BASIC? I heard someone was doing research based on whether chimps hammering away at a keyboard were tidier than production level BASIC code. Guess... – aviraldg Oct 11 '09 at 07:45
2

Cool idea. I had picked out "Complicity" several years ago but I like the idea of an ALLCAPS language! MIKTRAN, MOBOL, MIKEBASIC, MALEVOLENT, MALT, MARKV, MINGLE, MING, UNILANG... – Mike Oct 11 '09 at 07:51

RCIX · Answer 6 · 2009-10-11T07:53:00.827

2

Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.

Like i said, it was a very simple language.

edited Oct 11 '09 at 07:53

answered Oct 11 '09 at 07:47

RCIX

38,647
50
150
207

powerful syntax? c#? you're kidding me. But the .net framework and the community are nice though. – Thomas Danecker Jan 07 '10 at 00:47

score 2 · Answer 7 · answered Oct 11 '09 at 09:14

Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses . for both package and class namespaces ) means you can parse the code without loading every module that it refers to.

Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.

Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)

blwy10 · Answer 8 · 2009-10-11T09:38:50.147

I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).

It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.

Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.

A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).

All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.

score 1 · Answer 9 · answered Oct 16 '09 at 21:30

1

On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.

answered Oct 16 '09 at 21:30

Pete Kirkham

48,893
5
92
171

score 1 · Answer 10 · answered Oct 31 '09 at 14:12

Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily)

Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).

score 1 · Answer 11 · answered Mar 09 '10 at 03:28

1

Before you start creating a language you should read this:

Hanspeter Moessenboeck, The Art of Niklaus Wirth

ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf

answered Mar 09 '10 at 03:28

Jim Barker

76
1
3

Niklaus Wirth was terrible at compiler design. He violated Einsteins Law: Make everything as easy as possible but not easier. His languages were all way to easy to be productive. By the way, i like Modula3 which was not designed by him. – Lothar Sep 15 '10 at 23:32

score 1 · Answer 12 · answered Aug 10 '11 at 21:59

There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.

To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.

This may require more "thinking' and less "implementing".

BTW, if you've already found an answer (this question is a year old), you can post that and accept it.

score 0 · Answer 13 · answered Oct 11 '09 at 06:39

0

Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler? If you want to write a compiler, you'd use Flex + Bison. (google)

answered Oct 11 '09 at 06:39

aviraldg

9,531
6
41
56

1

If you want to write a good compiler, you'll hand-roll your own recursive-descent parser, because if a moderately complex Bison parser you'll soon run into issues (if not getting the language to work, then getting the compiler/interpreter to report errors). – Chris Lutz Oct 11 '09 at 07:44
@chris Yeah, maybe, but only for LISP (ASM, Scheme...) Manually writing a proper full blown compiler is the last thing you want to do ... just because of the complexity involved. – aviraldg Oct 11 '09 at 07:49
1

Not really. It's not terribly complicated, especially because there are tons of books and tutorials on the subject. And all major programming languages are written with hand-rolled parsers/lexers. – Chris Lutz Oct 11 '09 at 07:56
I believe that is because they are not regular grammars (I remember reading something about C++ and C# not being regular grammars??) – aviraldg Oct 11 '09 at 08:05
You mean "context-free" grammars, and this is partially true - Flex/Bison can only handle context-free grammars, and attempts can be made to force them to handle grammars with context, but it's very limiting. But the main reason GCC switched it's C compiler from Flex/Bison to hand-rolled was for better error reporting. (Regular grammars are parsed by regular expressions, and are rather too limited for full language design, if perhaps useful for many tasks.) – Chris Lutz Oct 11 '09 at 08:41

score 0 · Answer 14 · answered Oct 11 '09 at 07:11

Not an easy answer, but..

You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.

http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/

People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.

score 0 · Answer 15 · answered Oct 11 '09 at 09:06

First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.

I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.

Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.

Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)

score 0 · Answer 16 · answered Oct 24 '09 at 15:23

Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.

score 0 · Answer 17 · answered Nov 07 '10 at 10:21

Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.

The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)

score 0 · Answer 18 · answered Nov 12 '10 at 09:58

If you want to write a compiler obviously you need to read the Dragon Book ;)

Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:

http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0

score -1 · Answer 19 · answered Oct 15 '09 at 02:03

Mike --

If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: Gerry.Rzeppa@pobox.com

How to create a language these days?

19 Answers19

Linked

Related