94

I want to learn how to write a lexer. My university course had an assignment where we had to write a parser (and a lexer to go along with it) but this was given to us with no instruction or feedback (beyond the mark) so I didn't really learn much from it.

After searching for this topic, I can only find fairly advanced write ups which focus on areas which I feel are a few steps ahead of where I am at. I want a discussion on the basics of writing a lexer for a very simple language which I can use as a basis for investigating tokenising more complex languages.

At this stage I'm not really interested in best practices or optimisation techniques but instead prefer a focus on the essentials. What are some good resources to get me started?

vitaut
  • 49,672
  • 25
  • 199
  • 336
Rupert Madden-Abbott
  • 12,899
  • 14
  • 59
  • 74

2 Answers2

84

Basically there are two main approaches to writing a lexer:

  1. Creating a hand-written one in which case I recommend this small tutorial.
  2. Using some lexer generator tools such as lex. In this case, I recommend reading the tutorials to the particular tool of choice.

Also I would like to recommend the Kaleidoscope tutorial from the LLVM documentation. It runs through the implementation of a simple language and in particular demonstrates how to write a small lexer. There is a C++ and an Objective Caml version of the tutorial.

The classical textbook on the subject is Compilers: Principles, Techniques, and Tools also known as the Dragon Book. However this probably falls under the category of "fairly advanced write ups".

Drazisil
  • 3,070
  • 4
  • 33
  • 53
vitaut
  • 49,672
  • 25
  • 199
  • 336
  • 4
    The Kaleidoscope tutorial was the part that really answered this question for me. – Robert Byers Jul 12 '15 at 19:50
  • For more info on writing an LL(1) parser by hand see [this answer](https://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769). – jchook Jun 17 '19 at 21:43
13

The Dragon Book is probably the definitive guide on the subject, although it can be a bit overwhelming. Language Implementation Patterns and Programming Language Pragmatics are great resources as well.

Brandon Moretz
  • 7,512
  • 3
  • 33
  • 43
  • 5
    +1 on the Dragon book. Learned A LOT from it when in university. Yeah, there's a lot there, but if you're really interested in compiler design and implementation, it's a great resource. – DarinH Jun 02 '11 at 15:32