Questions tagged [lexer]

A program converting a sequence of characters into a sequence of tokens

A lexer is a program whose purpose is the conversion of a sequence of characters into a sequence of tokens. It is also often referred to as a scanner. A lexer often exists as a single function, which is called by a parser or another function.

1050 questions
360
votes
6 answers

lexers vs parsers

Are lexers and parsers really that different in theory? It seems fashionable to hate regular expressions: coding horror, another blog post. However, popular lexing based tools: pygments, geshi, or prettify, all use regular expressions. They seem…
Naveen
  • 5,910
  • 5
  • 30
  • 38
191
votes
4 answers

Looking for a clear definition of what a "tokenizer", "parser" and "lexers" are and how they are related to each other and used?

I am looking for a clear definition of what a "tokenizer", "parser" and "lexer" are and how they are related to each other (e.g., does a parser use a tokenizer or vice versa)? I need to create a program will go through c/h source files to extract…
lordhog
  • 3,427
  • 5
  • 32
  • 43
94
votes
2 answers

Where can I learn the basics of writing a lexer?

I want to learn how to write a lexer. My university course had an assignment where we had to write a parser (and a lexer to go along with it) but this was given to us with no instruction or feedback (beyond the mark) so I didn't really learn much…
Rupert Madden-Abbott
  • 12,899
  • 14
  • 59
  • 74
48
votes
1 answer

Should I use a lexer when using a parser combinator library like Parsec?

When writing a parser in a parser combinator library like Haskell's Parsec, you usually have 2 choices: Write a lexer to split your String input into tokens, then perform parsing on [Token] Directly write parser combinators on String The first…
nh2
  • 24,526
  • 11
  • 79
  • 128
38
votes
5 answers

Communication between lexer and parser

Every time I write a simple lexer and parser, I stumble upon the same question: how should the lexer and the parser communicate? I see four different approaches: The lexer eagerly converts the entire input string into a vector of tokens. Once this…
fredoverflow
  • 256,549
  • 94
  • 388
  • 662
37
votes
3 answers

Is it a Lexer's Job to Parse Numbers and Strings?

Is it a lexer's job to parse numbers and strings? This may or may not sound dumb, given that fact that I'm asking whether a lexer should parse input. However, I'm not sure whether that's in fact the lexer's job or the parser's job, because in order…
user541686
  • 205,094
  • 128
  • 528
  • 886
36
votes
10 answers

Lexer written in Javascript?

I have a project where a user needs to define a set of instructions for a ui that is completely written in javascript. I need to have the ability to parse a string of instructions and then translate them into instructions. Is there any libraries out…
Phobis
  • 7,524
  • 10
  • 47
  • 76
35
votes
7 answers

Poor man's "lexer" for C#

I'm trying to write a very simple parser in C#. I need a lexer -- something that lets me associate regular expressions with tokens, so it reads in regexs and gives me back symbols. It seems like I ought to be able to use Regex to do the actual heavy…
Paul Hollingsworth
  • 13,124
  • 12
  • 51
  • 68
35
votes
1 answer

What does an escaped ampersand mean in Haskell?

I looked at the Haskell 2010 report and noticed a weird escape sequence with an ampersand: \&. I couldn't find an explanation what this escape sequence should stand for. It also might only be located in strings. I tried print "\&" in GHCi, and it…
Nolan
  • 1,060
  • 1
  • 11
  • 34
35
votes
5 answers

When parsing Javascript, what determines the meaning of a slash?

Javascript has a tricky grammar to parse. Forward-slashes can mean a number of different things: division operator, regular expression literal, comment introducer, or line-comment introducer. The last two are easy to distinguish: if the slash is…
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
28
votes
5 answers

Where can I find a formal grammar for MATLAB?

I would like to write a lexer generator to convert a basic subset of the MATLAB language to C#, C++, etc. To help me do this, I would like to find a document containing the formal grammar for MATLAB. Having spent a bit of time investigating this, it…
Dave Maff
  • 798
  • 8
  • 12
27
votes
5 answers

hand coding a parser

For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don't tell me to read the dragon book, i'll come around to that eventually. I wanna…
John Leidegren
  • 59,920
  • 20
  • 131
  • 152
27
votes
2 answers

ANTLR4 visitor pattern on simple arithmetic example

I am a complete ANTLR4 newbie, so please forgive my ignorance. I ran into this presentation where a very simple arithmetic expression grammar is defined. It looks like: grammar Expressions; start : expr ; expr : left=expr op=('*'|'/') right=expr…
Giovanni Botta
  • 9,626
  • 5
  • 51
  • 94
24
votes
2 answers

What does \v and \r mean? Are they white spaces?

I'm taking a course on lexical analysis, and \t\v\r is used in the lexer token definitions to represent white spaces. What are \v and \r exactly??
user1297061
  • 1,531
  • 2
  • 13
  • 15
24
votes
2 answers

How can we get the Syntax Tree of TypeScript?

Is there a process on getting a syntax tree of a compiler. We had been assigned on a project that needs to access typescript's syntax tree (which is opensource so we could see the whole compiler's code). But we don't know how to get it. I've been…
Daj
  • 241
  • 2
  • 3
1
2 3
69 70