5

i would like to use the code generated by lex in another code that i have , but all the examples that i have seen is embedding the main function inside the lex file not the opposite.

is it possible to use(include) the c generated file from lex into other code that to have something like this (not necessarily the same) ?

#include<something>
int main(){
    Lexer l = Lexer("some string or input file");
    while (l.has_next()){
        Token * token = l.get_next_token();
        //somecode
    }
    //where token is just a simple object to hold the token type and lexeme
    return 0;
}
Ahmed Kotb
  • 6,269
  • 6
  • 33
  • 52
  • Why can't you use lex as usual, and code the rest of your application in another file, and link the files together? – Basile Starynkevitch Nov 04 '11 at 18:02
  • iam asking how to link the two files together , if you can provide an example it will be great – Ahmed Kotb Nov 04 '11 at 18:06
  • Very short answer: you can do exactly this with [RE/flex](https://sourceforge.net/projects/re-flex/) without any ugly code plumbing to add classes and deal with buffers (remember, C++ in Flex is still experimental). With RE/flex, the default lexer is `Lexer` and takes a string as input or a file as input. You can use method `lex()` to retrieve tokens. There is also a Flex-compatibility option to generate a Flex lexer `yyFlexLexer` and `yylex()` method. – Dr. Alex RE Nov 28 '16 at 03:40

3 Answers3

7

This is what I would start with:
Note: this is an example of using a C interface
To use the C++ interface add %option c++ See below

Test.lex

IdentPart1      [A-Za-z_]
Identifier      {IdentPart1}[A-Za-z_0-9]*
WHITESPACE      [ \t\r\n]

%option noyywrap

%%

{Identifier}      {return 257;}
{WHITESPACE}      {/* Ignore */}
.                 {return 258;}

%%

// This is the bit you want.
// It is best just to put this at the bottom of the lex file
// By default functions are extern. So you can create a header file with
// these as extern then included that header file in your code (See Lexer.h)
void* setUpBuffer(char const* text)
{
    YY_BUFFER_STATE buffer  = yy_scan_string(text);
    yy_switch_to_buffer(buffer);

    return buffer;
}

void tearDownBuffer(void* buffer)
{
    yy_delete_buffer((YY_BUFFER_STATE)buffer);
}

Lexer.h

#ifndef LOKI_A_LEXER_H
#define LOKI_A_LEXER_H

#include <string>

extern int   yylex();
extern char* yytext;
extern int   yyleng;

// Here is the interface to the lexer you set up above
extern void* setUpBuffer(char const* text);
extern void  tearDownBuffer(void* buffer);


class Lexer
{
    std::string         token;
    std::string         text;
    void*               buffer;
    public:
    Lexer(std::string const& t)
        : text(t)
    {
        // Use the interface to set up the buffer
        buffer  = setUpBuffer(text.c_str());
    }
    ~Lexer()
    {
        // Tear down your interface
        tearDownBuffer(buffer);
    }
    // Don't use RAW pointers
    // This is only a quick and dirty example.
    bool  nextToken()
    {
        int val = yylex();
        if (val != 0)
        {
            token = std::string(yytext, yyleng);
        }
        return val;
    }
    std::string const& theToken() const {return token;}
};

#endif

main.cpp

#include "Lexer.h"
#include <iostream>

int main()
{
    Lexer l("some string or input file");


    // Did not like your hasToken() interface.
    // Just call nextToken() until it fails.
    while (l.nextToken())
    {
        std::cout << l.theToken() << "\n";
        delete token;
    }
    //where token is just a simple object to hold the token type and lexeme
    return 0;
}

Build

> flext test.lex
> g++ main.cpp  lex.yy.c
> ./a.out
some
string
or
input
file
>

Alternatively you can use the C++ interface to flex (its experimental)

test.lext

%option c++


IdentPart1      [A-Za-z_]
Identifier      {IdentPart1}[A-Za-z_0-9]*
WHITESPACE      [ \t\r\n]

%%

{Identifier}      {return 257;}
{WHITESPACE}      {/* Ignore */}
.                 {return 258;}

%%

// Note this needs to be here
// If you define no yywrap() in the options it gets added to the header file
// which leads to multiple definitions if you are not careful.
int yyFlexLexer::yywrap()   { return 1;}

main.cpp

#include "MyLexer.h"
#include <iostream>
#include <sstream>

int main()
{
    std::istringstream  data("some string or input file");
    yyFlexLexer l(&data, &std::cout);


    while (l.yylex())
    {
        std::cout << std::string(l.YYText(), l.YYLeng()) << "\n";
    }
    //where token is just a simple object to hold the token type and lexeme
    return 0;
}

build

> flex --header-file=MyLexer.h test.lex
> g++ main.cpp lex.yy.cc
> ./a.out
some
string
or
input
file
>
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • Sorry, couldn’t help myself: [Scumbag C++ programmer](http://memegenerator.net/cache/instances/400x/10/10910/11172073.jpg) … /EDIT Ah damn, didn’t see your caveat. But still … ;-) – Konrad Rudolph Nov 04 '11 at 20:41
  • @Konrad Rudolph: Fixed [Gift](http://worshippingchristian.org/images/blog/light-of-the-world.jpg) – Martin York Nov 04 '11 at 23:21
0

Sure. I'm not sure about the generated class; we use the C generated parsers, and call them from C++. Or you can insert any sort of wrapper code you want in the lex file, and call anything there from outside of the generated file.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • if you could provide an example , it will be great – Ahmed Kotb Nov 04 '11 at 18:13
  • @AhmedKotb I'm not sure what you're looking for in terms of example. After the second `%%`, you can write any C++ code you like; before the first `%%`, you can put any code you like in `%{...%}`. I have a couple of small utilities, for example, that define a class which derives from `yyFlexLexer` in the `%{...`%}` section in the `.l` file; in this case, the `main` is in the `.l`, but it could just as easily be in another file. – James Kanze Nov 07 '11 at 09:34
0

The keywords are %option reentrant or %option c++.

As an example here's the ncr2a scanner:

/** ncr2a_lex.l: Replace all NCRs by corresponding printable ASCII characters. */
%%
&#(1([01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]); { /* accept 32..126 */
  /** `+2` skips '&#', `atoi()` ignores ';' at the end */
  fputc(atoi(yytext + 2), yyout); /* non-recursive version */
}

The scanner code can be left unchanged.

Here the program that uses it:

/** ncr2a.c */
#include "ncr2a_lex.h"  

typedef struct {
  int i,j; /** put here whatever you need to keep extra state */
} State; 

int main () {
  yyscan_t scanner;
  State my_custom_data = {0,0};

  yylex_init(&scanner);
  yyset_extra(&my_custom_data, scanner);

  yylex(scanner);

  yylex_destroy(scanner);
  return 0;
}

To build ncr2a executable:

flex -R -oncr2a_lex.c --header-file=ncr2a_lex.h ncr2a_lex.l 
cc -c -o ncr2a_lex.o ncr2a_lex.c
cc -o ncr2a ncr2a_lex.o ncr2a.c -lfl

Example

$ echo 'three colons &#58;&#58;&#58;' | ./ncr2a
three colons :::

This example uses stdin/stdout as input/output and it calls yylex() once.

To read from a file:

yyin = fopen("input.txt", "r" );

@Loki Astari's answer shows how to read from a string (buffer = yy_scan_string(text, scanner); yy_switch_to_buffer(buffer, scanner)) .

To call yylex() once for each token add return inside rule definitions that yield full token in the *.l file.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670