I've been given a job of 'translating' one language into another. The source is too flexible (complex) for a simple line by line approach with regex. Where can I go to learn more about lexical analysis and parsers?
12 Answers
If you want to get "emotional" about the subject, pick up a copy of "The Dragon Book." It is usually the text in a compiler design course. It will definitely meet your need "learn more about lexical analysis and parsers" as well as a bunch of other fun stuff!
IMH(umble)O, save yourself an arm and/or leg and buy an older edition - it will fill your information desires.

- 2,026
- 1
- 20
- 22
-
Matt, there're 3 edition, so please add ISBN for the one you're suggesting, or improve your comments with all books' ISBN's and say a word or two about each. – Ostati Dec 17 '18 at 16:26
-
really @Ostati? I clearly state to save yourself some money and buy and older edition... or don't save money, and buy current. – Matt Cummings Jan 11 '19 at 15:44
-
Matt, it took me a while to find out which version was which. But had your answer, which I upvoted btw, the ISBN....anyway, I go the book and started my journey. Thx. – Ostati Jan 11 '19 at 19:48
Niklaus Wirth's book "Compiler Construction" (available as a free PDF) http://www.google.com/search?q=wirth+compiler+construction

- 1,987
- 14
- 23
Lots of people have recommended books. For many these are much more useful in a structured environment with assignments and due dates and so forth. Even if not, having the material presented in a different way can help greatly.
(a) Have you considered going to a school with a decent CS curriculum?
(b) There are lots of online lectures, such as MIT's Open Courseware. Their EE/CS section has many courses that touch on parsing, though I can't see any on parsing per se. It's typically introduced as one of the first theory courses as language classification and automata is at the heart of much of CS theory.

- 9,764
- 37
- 47
-
+1 for mit's ocw, I use it all the time for math. For some reason, going to class on MY schedual is so much better than getting up at 6:30. – Shawn Apr 21 '11 at 03:24
I've recently been working with PLY which is an implementation of lex and yacc in Python. It's quite easy to get started with it and there are some simple examples in the documentation.
Parsing can quickly become a very technical topic and you'll find that you probably won't need to know all the details of the parsing algorithm if you're using a parser builder like PLY.

- 951,095
- 183
- 1,149
- 1,285
I found this site helpful:
The first time I used lex/yacc was for a relatively simple project. This tutorial was all I really needed. When I approached more complex projects later, the familiarity I had from this tutorial and a simple project allowed me to build something fancier.

- 84,419
- 25
- 57
- 67
After taking (quite) a few compilers classes, I've used both The Dragon Book and C&T. I think C&T does a far better job of making compiler construction digestible. Not to take anything away from The Dragon Book, but I think C&T is a far more practical book.
Also, if you like writing in Java, I recommend using JFlex and BYACC/J for your lexing and parsing needs.

- 8,740
- 7
- 32
- 36
Parsing Techniques - A Practical Guide By Dick Grune and Ceriel J.H. Jacobs
This book (freely available as PDF) gives an extensive overview of different parsing techniques/algorithms. If you really want to understand the different parsing algorithms, this IMO is a better reference than the Dragon Book (as Parsing Techniques focuses entirely on parsing, while the Dragon Book covers parsing only as one - although important - part of the compiler construction process).

- 1,812
- 2
- 15
- 15
-
I've fixed the link: the actual PDF (for the first edition) can be downloaded here: http://dickgrune.com/Books/PTAPG_1st_Edition/BookBody.pdf ; a new edition and more extensive edition of the book is now also available on Amazon – Gio Dec 01 '12 at 11:31
If you prefer Java based tools, the Java Compiler Compiler, JavaCC, is a nice parser/scanner. It's config file driven, and will generate java code that you can include in your program. I haven't used it a couple years though, so I'm not sure how the current version is. You can find out more here: https://javacc.dev.java.net/

- 2,396
- 5
- 24
- 33
flex and bison are the new lex and yacc though. The syntax for BNF is often derided for being a bit obtuse. Some have moved to ANTLR and Ragel for this reason.
If you're not doing much translation, you may one to pull a one-off using multiline regexes with Perl or Ruby. Writing a compatible BNF grammar for an existing language is not a task to be taken lightly.
On the other hand, it is entirely possible to leverage any given language's .l and .y files if they are available as open source. Then, you could construct new code from an existing parse tree.
Lexing/Parsing + typecheck + code generation is a great CS exercise I would recommend it to anyone wanting a solid basis, so I'm all for the Dragon Book

- 7,042
- 7
- 44
- 67
Yet another textbook to consider is Programming Language Pragmatics. I prefer it over the Dragon book, but YMMV.
If you're using Perl, yet another tool to consider is Parse::RecDescent.
If you just need to do this translation once and don't know anything about compiler technology, I would suggest that you get as far as you can with some fairly simplistic translations and then fix it up by hand. Yes, it is a lot of work. But it is less work than learning a complex subject and coding up the right solution for one job. That said, you should still learn the subject, but don't let not knowing it be a roadblock to finishing your current project.

- 17,024
- 9
- 81
- 111

- 9,273
- 2
- 25
- 25