Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions
149
votes
5 answers

Can we write comments within variable names?

int main() { i/*nt*/a = 10; return 0; } If I have the above code and I want to count the tokens, will it be 14 or 13 tokens? Is it valid to write a comment within a variable name? You can assume that the int i, int a, int ia are globally…
Vinita
  • 1,834
  • 2
  • 8
  • 20
37
votes
2 answers

Python regular expressions - how to capture multiple groups from a wildcard expression?

I have a Python regular expression that contains a group which can occur zero or many times - but when I retrieve the list of groups afterwards, only the last one is present. Example: re.search("(\w)*", "abcdefg").groups() this returns the list…
John B
  • 3,391
  • 5
  • 33
  • 29
33
votes
12 answers

Does an algorithm exist to help detect the "primary topic" of an English sentence?

I'm trying to find out if there is a known algorithm that can detect the "key concept" of a sentence. The use case is as follows: User enters a sentence as a query (Does chicken taste like turkey?) Our system identifies the concepts of the sentence…
rockit
  • 339
  • 1
  • 3
  • 4
31
votes
2 answers

Practical difference between parser rules and lexer rules in ANTLR?

I understand the theory behind separating parser rules and lexer rules in theory, but what are the practical differences between these two statements in ANTLR: my_rule: ... ; MY_RULE: ... ; Do they result in different AST trees? Different…
Tony the Pony
  • 40,327
  • 71
  • 187
  • 281
24
votes
2 answers

yylval and union

What is the purpose of union in the yacc file? Is it directly related to yylval in the flex file? If you don't use yylval, then you don't need to use union?
neuromancer
  • 53,769
  • 78
  • 166
  • 223
21
votes
1 answer

Ignore whitespace with PEG.js

I want to ignore whitespaces and new lines with my grammar so they are missing in the PEG.js output. Also, a literal within brackets should be returned in a new array. Grammar start = 'a'? sep+ ('cat'/'dog') sep* '(' sep* stmt_list sep*…
Matthias
  • 7,432
  • 6
  • 55
  • 88
21
votes
7 answers

Algorithms or libraries for textual analysis, specifically: dominant words, phrases across text, and collection of text

I'm working on a project where I need to analyze a page of text and collections of pages of text to determine dominant words. I'd like to know if there is a library (prefer c# or java) that will handle the heavy lifting for me. If not, is there…
Michael Julson
19
votes
1 answer

Managing position information with Alex and Happy

I'm learning to use Alex and Happy to write a small compiler. I want to maintain line and column information for my AST nodes so that I can provide meaningful error messages to the user. To illustrate how I plan to do it, I wrote a small example…
gnuvince
  • 2,357
  • 20
  • 27
18
votes
2 answers

What's the difference between a parser and a scanner?

I already made a scanner, now I'm supposed to make a parser. What's the difference?
neuromancer
  • 53,769
  • 78
  • 166
  • 223
17
votes
3 answers

Find out the position where a regular expression failed

I'm trying to write a lexer in JavaScript for finding tokens of a simple domain-specific language. I started with a simple implementation which just tries to match subsequent regexps from the current position in a line to find out whether it matches…
SasQ
  • 14,009
  • 7
  • 43
  • 43
17
votes
6 answers

Efficiently match multiple regexes in Python

Lexical analyzers are quite easy to write when you have regexes. Today I wanted to write a simple general analyzer in Python, and came up with: import re import sys class Token(object): """ A simple Token structure. Contains the token…
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
16
votes
10 answers

Have you ever effectively used lexer/parser in real world application?

Recently, I've started learning ANTLR. I know that lexers/parsers together can be used to construct programming languages. Other than DSLs or programming languages, have you ever directly or indirectly used lexer/parser tools (and knowledge) to…
16
votes
1 answer

How to implement Lexical Analysis in Javascript

Hey folks, thanks for reading I am currently attempting to do a Google-style calculator. You input a string, it determines if it can be calculated and returns the result. I began slowly with the basics : + - / * and parenthesis handling. I am…
Gabriel S.
  • 1,961
  • 2
  • 20
  • 30
16
votes
3 answers

What profilers and analyzers are there for Erlang/OTP?

Are there any good code profilers/analyzers for Erlang? I need something that can build a call graph (eg gprof) for my code.
Sushant
  • 1,013
  • 1
  • 11
  • 20
16
votes
4 answers

Tips for creating "Context Free Grammar"

I am new to CFG's, Can someone give me tips in creating CFG that generates some language For example L = {am bn | m >= n} What I got is: So -> a | aSo | aS1 | e S1 -> b | bS1 | e but I think this area is wrong, because there is a chance…
user1988365
1
2 3
56 57