13

Is it possible to get clang or gcc to display the result of the lexing phase?

Benno
  • 5,288
  • 5
  • 42
  • 60
  • 1
    Modern compilers doesn't really have a separate lexing phase, instead it's coupled with the parsing. – Some programmer dude Dec 31 '14 at 01:13
  • Joachim: While I have to admit that I didn't read the source in detail, clang certainly has separate `Lex` and `Parse` directories with lexer- and parser-specific classes, and gcc has `c-lex.c` which at a glance also seems to implement a lexing pass. – Benno Dec 31 '14 at 01:21
  • 6
    While there are special source files for lexing, the actual lexing is done "on the fly". When the parser needs a token it asks the lexer for it, and then the lexer actually extracts the token and returns it to the parser. A separate lexing phase was eally ony needed when computers didn't have much memory, so the lexer ran as a separate program to save memory for the parser. It's now a thing of computing history – Some programmer dude Dec 31 '14 at 01:23
  • 1
    What do you mean by displaying the result? – didierc Dec 31 '14 at 01:52
  • possible duplicate of [How can I see parse tree, intermediate code, optimization code and assembly code during COMPILATION?](http://stackoverflow.com/questions/1496497/how-can-i-see-parse-tree-intermediate-code-optimization-code-and-assembly-code) – Jonathon Reinhart Dec 31 '14 at 08:07
  • 1
    It should be pretty trivial to write a "parser simulator" that simply calls the lexer repeatedly, thus producing a sequence of lexemes. – Ira Baxter Dec 31 '14 at 13:49
  • @JoachimPileborg: I've been in computing for 40 years, and never once found a compiler that ran a separate lexing pass, including building a variety of these on late 60s minicomputers and early 1970 microprocessors. Lexers just aren't that big compared to the rest of a compiler, especially when restricted to legacy 7 bit character sets. You have specific examples? – Ira Baxter Dec 31 '14 at 13:51
  • @IraBaxter I haven't been in the industry as long as you, "only" 20 years so far, but I "heard" and "read" that having separate lexing phase was not uncommon in the punch-card and magnetic tape era. YMMV etc. :) – Some programmer dude Dec 31 '14 at 14:19
  • 1
    I heard the IBM 1401 computer (vintage mid 60s, yes, I actually coded a bit for one of these)) had a 17-pass COBOL compiler because it did not have a lot of memory. I did not specifically hear that it had a separate lexer, but then I didn't ask. So, perhaps. But yes, the bit about separate lexers if real is pretty old. – Ira Baxter Dec 31 '14 at 17:10
  • I'm of a similar vintage to @IraBaxter and I would agree with him, but here are two odd data points: (1) there was a persistent rumour of an IBM compiler that had 37 or so passes 'because there were 37 guys on the project', and (2) John J Donovan, *Systems Programming,* 1972, talks about the lexical analyser outputting something called a 'uniform symbol table'. I could never make head or tail of this description, and I have certainly never written or encountered a compiler that identifiably used one. Unclear whether Donovan had ever really written a compiler. – user207421 Oct 24 '16 at 02:43

1 Answers1

16

Although the parser does poll the lexer without there being a proper "lexing phase" this does not mean that you cannot dump the tokens as they are lexed. This is done with the command :

clang -fsyntax-only -Xclang -dump-tokens code.c
OlivierLi
  • 2,798
  • 1
  • 23
  • 30