Questions tagged [lexical-analysis]

Process of converting a sequence of characters into a sequence of tokens.

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner.

The lexical syntax is usually a regular language, whose atoms are individual characters, while the phrase syntax is usually a context-free language, whose atoms are words (tokens produced by the lexer). While this is a common separation, alternatively, a lexer can be combined with the parser in scannerless parsing.

843 questions

149

votes

5 answers

Can we write comments within variable names?

int main() { i/*nt*/a = 10; return 0; } If I have the above code and I want to count the tokens, will it be 14 or 13 tokens? Is it valid to write a comment within a variable name? You can assume that the int i, int a, int ia are globally…

c lexical-analysis

asked Aug 27 '20 at 04:14

Vinita

1,834
2
8
20

votes

2 answers

Python regular expressions - how to capture multiple groups from a wildcard expression?

I have a Python regular expression that contains a group which can occur zero or many times - but when I retrieve the list of groups afterwards, only the last one is present. Example: re.search("(\w)*", "abcdefg").groups() this returns the list…

python regex lexical-analysis

asked Jan 21 '09 at 10:29

John B

3,391
5
33
29

votes

12 answers

Does an algorithm exist to help detect the "primary topic" of an English sentence?

I'm trying to find out if there is a known algorithm that can detect the "key concept" of a sentence. The use case is as follows: User enters a sentence as a query (Does chicken taste like turkey?) Our system identifies the concepts of the sentence…

algorithm nlp semantics lexical-analysis

asked Apr 04 '11 at 21:19

rockit

votes

2 answers

Practical difference between parser rules and lexer rules in ANTLR?

I understand the theory behind separating parser rules and lexer rules in theory, but what are the practical differences between these two statements in ANTLR: my_rule: ... ; MY_RULE: ... ; Do they result in different AST trees? Different…

parsing antlr lexical-analysis

asked Nov 28 '10 at 16:30

Tony the Pony

40,327
71
187
281

votes

2 answers

yylval and union

What is the purpose of union in the yacc file? Is it directly related to yylval in the flex file? If you don't use yylval, then you don't need to use union?

parsing yacc bison lexical-analysis

asked Dec 05 '09 at 19:35

neuromancer

53,769
78
166
223

votes

1 answer

Ignore whitespace with PEG.js

I want to ignore whitespaces and new lines with my grammar so they are missing in the PEG.js output. Also, a literal within brackets should be returned in a new array. Grammar start = 'a'? sep+ ('cat'/'dog') sep* '(' sep* stmt_list sep*…

javascript parsing lexical-analysis peg

asked Nov 24 '11 at 12:37

Matthias

7,432
6
55
88

votes

7 answers

Algorithms or libraries for textual analysis, specifically: dominant words, phrases across text, and collection of text

I'm working on a project where I need to analyze a page of text and collections of pages of text to determine dominant words. I'd like to know if there is a library (prefer c# or java) that will handle the heavy lifting for me. If not, is there…

algorithm text nlp analysis lexical-analysis

asked Oct 20 '08 at 22:38

Michael Julson

votes

1 answer

Managing position information with Alex and Happy

I'm learning to use Alex and Happy to write a small compiler. I want to maintain line and column information for my AST nodes so that I can provide meaningful error messages to the user. To illustrate how I plan to do it, I wrote a small example…

parsing haskell lexical-analysis happy alex

asked Dec 15 '13 at 01:39

gnuvince

2,357
20
27

votes

2 answers

What's the difference between a parser and a scanner?

I already made a scanner, now I'm supposed to make a parser. What's the difference?

parsing yacc lexical-analysis

asked Nov 15 '09 at 22:55

neuromancer

53,769
78
166
223

votes

3 answers

Find out the position where a regular expression failed

I'm trying to write a lexer in JavaScript for finding tokens of a simple domain-specific language. I started with a simple implementation which just tries to match subsequent regexps from the current position in a line to find out whether it matches…

javascript regex lexical-analysis

asked May 23 '14 at 22:59

SasQ

14,009
7
43
43

votes

6 answers

Efficiently match multiple regexes in Python

Lexical analyzers are quite easy to write when you have regexes. Today I wanted to write a simple general analyzer in Python, and came up with: import re import sys class Token(object): """ A simple Token structure. Contains the token…

python regex lexical-analysis

asked Sep 25 '08 at 15:10

Eli Bendersky

263,248
89
350
412

votes

10 answers

Have you ever effectively used lexer/parser in real world application?

Recently, I've started learning ANTLR. I know that lexers/parsers together can be used to construct programming languages. Other than DSLs or programming languages, have you ever directly or indirectly used lexer/parser tools (and knowledge) to…

parsing compiler-construction tokenize lexical-analysis

asked Mar 14 '09 at 05:46

Mohan Narayanaswamy

2,149
6
33
40

votes

1 answer

How to implement Lexical Analysis in Javascript

Hey folks, thanks for reading I am currently attempting to do a Google-style calculator. You input a string, it determines if it can be calculated and returns the result. I began slowly with the basics : + - / * and parenthesis handling. I am…

javascript regex pattern-matching lexical-analysis

asked Jan 18 '11 at 16:38

Gabriel S.

1,961
2
20
30

votes

3 answers

What profilers and analyzers are there for Erlang/OTP?

Are there any good code profilers/analyzers for Erlang? I need something that can build a call graph (eg gprof) for my code.

erlang profiler lexical-analysis

asked Oct 15 '08 at 14:28

Sushant

1,013
1
11
20

votes

4 answers

Tips for creating "Context Free Grammar"

I am new to CFG's, Can someone give me tips in creating CFG that generates some language For example L = {am bn | m >= n} What I got is: So -> a | aSo | aS1 | e S1 -> b | bS1 | e but I think this area is wrong, because there is a chance…

grammar context-free-grammar lexical-analysis formal-languages

asked Feb 28 '13 at 03:20

user1988365

2 3

…

56 57 Next