In theory, it's possible to find a regular expression which will match a string not containing a pattern, but except in the case of very simple patterns, it is neither easy nor legible.
If all you want to do is search for (and react to) specific patterns, you could use a default rule which matches one character and does nothing:
{Pattern1} { /* Do something with the pattern */ }
{Pattern2} { /* Do something with the pattern */ }
.|\n /* Default rule does nothing */
If, on the other hand, you wanted to do something with the not-otherwise-matched strings (as in your example), you'll need to default rule to accumulate the strings, and the pattern rules to "send" (return) the accumulated token before acting on the token which they matched. That means that some actions will need to send two tokens, which is a bit awkward with the standard parser calls scanner for a token
architecture, because it requires the scanner to maintain some state.
If you have a no-too-ancient version of bison
, you could use a "push parser" instead, which allows the scanner to call the parser. That makes it easy to send two tokens in a single action. Otherwise, you need to build a kind of state machine into your scanner.
Below is a simple example (which needs pattern definitions, among other things) using a push-parser.
%{
#include <stdlib.h>
#include <string.h>
#include "parser.tab.h"
/* Since the lexer calls the parser and we call the lexer,
* we pass through a parser (state) to the lexer. This is
* how you change the `yylex` prototype:
*/
#define YY_DECL static int yylex(yypstate* parser)
%}
pattern1 ...
pattern2 ...
/* Standard "avoid warnings" options */
%option noyywrap noinput nounput nodefault
%%
/* Indented code before the first pattern is inserted at the beginning
* of yylex, perfect for local variables.
*/
size_t vhdl_length = 0;
/* These are macros because they do out-of-sequence return on error. */
/* If you don't like macros, please accept my apologies for the offense. */
#define SEND_(toke, leng) do { \
size_t leng_ = leng; \
char* text = memmove(malloc(leng_ + 1), yytext, leng_); \
text[leng_] = 0; \
int status = yypush_parse(parser, toke, &text); \
if (status != YYPUSH_MORE) return status; \
} while(0);
#define SEND_TOKEN(toke) SEND_(toke, yyleng)
#define SEND_TEXT do if(vhdl_length){ \
SEND_(TEXT, vhdl_length); \
yytext += vhdl_length; yyleng -= vhdl_length; vhdl_length = 0; \
} while(0);
{pattern1} { SEND_TEXT; SEND_TOKEN(TOK_1); }
{pattern2} { SEND_TEXT; SEND_TOKEN(TOK_2); }
/* Default action just registers that we have one more char
* calls yymore() to keep accumulating the token.
*/
.|\n { ++vhdl_length; yymore(); }
/* In the push model, we're responsible for sending EOF to the parser */
<<EOF>> { SEND_TEXT; return yypush_parse(parser, 0, 0); }
%%
/* In this model, the lexer drives everything, so we provide the
* top-level interface here.
*/
int parse_vhdl(FILE* in) {
yyin = in;
/* Create a new pure push parser */
yypstate* parser = yypstate_new();
int status = yylex(parser);
yypstate_delete(parser);
return status;
}
To actually get that to work with bison, you need to provide a couple of extra options:
parser.y
%code requires {
/* requires blocks get copied into the tab.h file */
/* Don't do this if you prefer a %union declaration, of course */
#define YYSTYPE char*
}
%code {
#include <stdio.h>
void yyerror(const char* msg) { fprintf(stderr, "%s\n", msg); }
}
%define api.pure full
%define api.push-pull push