declaration: type instance SEMICOLON
{
//I want to get string here
}
type:...
instance:...
I want to get the currently matched declaration string in the declaration action.Is there any way to do this in yacc?
declaration: type instance SEMICOLON
{
//I want to get string here
}
type:...
instance:...
I want to get the currently matched declaration string in the declaration action.Is there any way to do this in yacc?
If one is interested in the input string corresponding to a grammatical rule, and the original whitespaces do not matter, one possibility, as mentioned in the comments, would be for the scanner to supply not only values but additionally also the lexemes.
But this is still not sufficient. The parser would have to concatenate the individual lexemes of the affected token in the semantic actions.
To avoid memory leaks, processed strings must then also be freed accordingly.
Scanner adaptions
Let assume you have a scanner rule for a NUMBER
token:
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
We would now also have to provide the lexeme in addition. This means, however, that a union for the different value types is not enough, but a struct that combines the lexeme and the value union is needed.
A simple struct with only one int value could look as follows:
struct my_token {
char *text;
union {
int ival;
} u;
};
Then the lexer rule could look like this:
[0-9]+ { yylval->u.ival = atoi(yytext); yylval->text = string_dup(yytext); return NUMBER; }
Parser adaptions
A simple Yacc parser grammar rule that performs only some numerical computations usually looks something like this:
EXPR: NUMBER { $$ = $1; }
| EXPR '+' EXPR { $$ = $1 - $3; }
...
The semantic actions would now look more complex: it needs to access the value in a more complex way. A string_merge
function would concatenate the lexemes of the different tokens. Most likely, one only accepts the additional complexity if one really needs it. For example, there are simpler options if it is only needed for debugging. Furthermore, the string_merge
function should then also free the memory of the individual lexemes.
EXPR: NUMBER { $$ = $1; }
| EXPR '-' EXPR { $$.u.ival = $1.u.ival - $3.u.ival; $$.text = string_merge($1.text, $2.text, $3.text); }
| EXPR '+' EXPR { $$.u.ival = $1.u.ival + $3.u.ival; $$.text = string_merge($1.text, $2.text, $3.text); }
Freeing memory in case of parse errors
Small side note: If you use Bison and also need to free memory on parse errors, e.g. because you have a long-lived parser, as in a shell etc., you should read here: https://www.gnu.org/software/bison/manual/html_node/Destructor-Decl.html.
Complete self contained example
The following example illustrates the above points using flex and Bison.
calc.l
%{
#include "calc.tab.h"
#include "string_util.h"
%}
%option noyywrap noinput nounput bison-locations
%%
[\t \n]+ { }
[0-9]+ { yylval->u.ival = atoi(yytext); yylval->text = string_dup(yytext); return NUMBER; }
";" { return SEMICOLON; }
. { yylval->text = string_dup(yytext); return yytext[0]; }
%%
calc.y
To use the struct defined above, one can use the following line in Bison:
%define api.value.type {struct my_token}
see https://www.gnu.org/software/bison/manual/html_node/Value-Type.html
%locations
%define api.pure full
%define parse.error detailed
%code top {
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
}
%code requires {
struct my_token {
char *text;
union {
int ival;
} u;
};
}
%code {
#include "calc.tab.h"
#include "string_util.h"
void yyerror(YYLTYPE* yyllocp, const char* msg);
int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp);
}
%define api.value.type {struct my_token}
%token NUMBER
%token SEMICOLON
%left '-' '+'
%left '*' '/'
%left '(' ')'
%%
EXPR_LIST: { $$.u.ival = 0; }
| EXPR_LIST EXPR SEMICOLON { printf("%d >>%s<<\n", $2.u.ival, $2.text); string_free($2.text); }
| EXPR_LIST SEMICOLON
;
EXPR: NUMBER { $$ = $1; }
| EXPR '-' EXPR { $$.u.ival = $1.u.ival - $3.u.ival; $$.text = string_merge($1.text, $2.text, $3.text); }
| EXPR '+' EXPR { $$.u.ival = $1.u.ival + $3.u.ival; $$.text = string_merge($1.text, $2.text, $3.text); }
| EXPR '*' EXPR { $$.u.ival = $1.u.ival * $3.u.ival; $$.text = string_merge($1.text, $2.text, $3.text); }
| EXPR '/' EXPR { $$.u.ival = $1.u.ival / $3.u.ival; $$.text = string_merge($1.text, $2.text, $3.text); }
| '(' EXPR ')' { $$.u.ival = $2.u.ival; $$.text = string_merge($1.text, $2.text, $3.text); }
;
%%
void yyerror(YYLTYPE *yyllocp, const char *str) {
fprintf(stderr, "error: %s in line %d, column %d\n", str, yyllocp->first_line, yyllocp->first_column);
}
int main(void)
{
yyparse();
if(allocations != 0) {
fprintf(stderr, "%d allocations were not freed\n", allocations);
}
return 0;
}
string_util.c
The variable allcations
is incremented on each allocation and decremented when the memory is freed. When the end of the program is reached without parsing errors, allcations
should be 0.
#include <string.h>
#include <stdlib.h>
#include "string_util.h"
int allocations = 0;
char *string_dup(char *str) {
allocations++;
return strdup(str);
}
void string_free(const char *ptr) {
allocations--;
free((void *)ptr);
}
char *string_merge(const char *str1, const char *str2, const char *str3) {
size_t len = strlen(str1) + strlen(str2) + strlen(str3) + 1;
char *ptr = malloc(len);
allocations++;
strcpy(ptr, str1);
strcat(ptr, str2);
strcat(ptr, str3);
string_free(str1);
string_free(str2);
string_free(str3);
return ptr;
}
string_util.h
#ifndef STRING_UTIL_H
#define STRING_UTIL_H
extern int allocations;
char *string_dup(char *str);
void string_free(const char *ptr);
char *string_merge(const char *str1, const char *str2, const char *str3);
#endif //STRING_UTIL_H
Test
With the input
5 + 3 + 2 + 1;
5 + 2 + 1 + 1; (5 - 4) * (3 + 1) - 2;
5 - 4 + (1 - 2);
the program prints the following on the debug console:
11 >>5+3+2+1<<
9 >>5+2+1+1<<
2 >>(5-4)*(3+1)-2<<
0 >>5-4+(1-2)<<
So between >>
and <<
the matching string is displayed.