Is it possible to get clang or gcc to display the result of the lexing phase?
Asked
Active
Viewed 3,817 times
13
-
1Modern compilers doesn't really have a separate lexing phase, instead it's coupled with the parsing. – Some programmer dude Dec 31 '14 at 01:13
-
Joachim: While I have to admit that I didn't read the source in detail, clang certainly has separate `Lex` and `Parse` directories with lexer- and parser-specific classes, and gcc has `c-lex.c` which at a glance also seems to implement a lexing pass. – Benno Dec 31 '14 at 01:21
-
6While there are special source files for lexing, the actual lexing is done "on the fly". When the parser needs a token it asks the lexer for it, and then the lexer actually extracts the token and returns it to the parser. A separate lexing phase was eally ony needed when computers didn't have much memory, so the lexer ran as a separate program to save memory for the parser. It's now a thing of computing history – Some programmer dude Dec 31 '14 at 01:23
-
1What do you mean by displaying the result? – didierc Dec 31 '14 at 01:52
-
possible duplicate of [How can I see parse tree, intermediate code, optimization code and assembly code during COMPILATION?](http://stackoverflow.com/questions/1496497/how-can-i-see-parse-tree-intermediate-code-optimization-code-and-assembly-code) – Jonathon Reinhart Dec 31 '14 at 08:07
-
1It should be pretty trivial to write a "parser simulator" that simply calls the lexer repeatedly, thus producing a sequence of lexemes. – Ira Baxter Dec 31 '14 at 13:49
-
@JoachimPileborg: I've been in computing for 40 years, and never once found a compiler that ran a separate lexing pass, including building a variety of these on late 60s minicomputers and early 1970 microprocessors. Lexers just aren't that big compared to the rest of a compiler, especially when restricted to legacy 7 bit character sets. You have specific examples? – Ira Baxter Dec 31 '14 at 13:51
-
@IraBaxter I haven't been in the industry as long as you, "only" 20 years so far, but I "heard" and "read" that having separate lexing phase was not uncommon in the punch-card and magnetic tape era. YMMV etc. :) – Some programmer dude Dec 31 '14 at 14:19
-
1I heard the IBM 1401 computer (vintage mid 60s, yes, I actually coded a bit for one of these)) had a 17-pass COBOL compiler because it did not have a lot of memory. I did not specifically hear that it had a separate lexer, but then I didn't ask. So, perhaps. But yes, the bit about separate lexers if real is pretty old. – Ira Baxter Dec 31 '14 at 17:10
-
I'm of a similar vintage to @IraBaxter and I would agree with him, but here are two odd data points: (1) there was a persistent rumour of an IBM compiler that had 37 or so passes 'because there were 37 guys on the project', and (2) John J Donovan, *Systems Programming,* 1972, talks about the lexical analyser outputting something called a 'uniform symbol table'. I could never make head or tail of this description, and I have certainly never written or encountered a compiler that identifiably used one. Unclear whether Donovan had ever really written a compiler. – user207421 Oct 24 '16 at 02:43
1 Answers
16
Although the parser does poll the lexer without there being a proper "lexing phase" this does not mean that you cannot dump the tokens as they are lexed. This is done with the command :
clang -fsyntax-only -Xclang -dump-tokens code.c

OlivierLi
- 2,798
- 1
- 23
- 30
-
2To avoid `error: linker command failed` in the output, the command should be, `clang -fsyntax-only -Xclang -dump-tokens code.c` – codeman48 May 02 '20 at 08:13
-
2