1

I have a wonderful parser written in flex and bison that successfully parses a tortuous obfuscated program in a vintage language (Without a single shift/shift or shift/reduce conflit). Next step, building the AST.

Now, I'd like to use the wonderful C++11 resource-managing non-POD types like std::string to pass tokens from flex to bison. Problem is, the YYSTYPE is a union.

Let's say that I may pass either std::string or int for tokens. I could use a boost::variant or a hand-crafted version thereof, but is there a way to tell bison and flex not to use an union?

Laurent LA RIZZA
  • 2,905
  • 1
  • 23
  • 41
  • c++11 non-POD unions are for classes and structs with default ctors and dtors only. This is b/c there is no call to a class's ctor/dtor when changing the union type. – JayInNyc Mar 01 '14 at 16:27
  • @JayInNyc: You may want to read this: http://stackoverflow.com/questions/19764150/questions-regarding-c-non-pod-unions. I'm not afraid of placement `new`s and explicit destructor calls. That's why I talked about "juggling code". – Laurent LA RIZZA Mar 01 '14 at 16:29
  • 1
    Yes, got it. You should, then, not use vague language in your questions when seeking concrete answers. – JayInNyc Mar 01 '14 at 17:45
  • Are you telling bison to generate a C++ parser, or just using the C parsing framework and compiling it with C++? (Both are possible, but the answer depends a bit on which way you're going.) – rici Mar 01 '14 at 19:03
  • @rici: Good question. For now, I'm generating a C parser, but if generating a C++ parser removes roadblocks, I'm ready to go for it. I have no other code than the parsing grammar for now. – Laurent LA RIZZA Mar 01 '14 at 19:08
  • 1
    @JayInNyc: Sorry for this. Edited the question to remove the offending part, and rephrased it. – Laurent LA RIZZA Mar 01 '14 at 19:13
  • @LaurentLARIZZA: If you're using bison 3, you should be able to just `%define api.value.type {my_variant_type}`. Does that not work for you? (I'll add a longer, somewhat different answer later when I have more time, unless that's good enough for you.) – rici Mar 01 '14 at 19:43
  • @rici: No. I'm using version 2.7.12-4996. – Laurent LA RIZZA Mar 01 '14 at 22:16
  • Weird that no one noticed that I wrote "shift/shift" instead of "reduce/reduce"... – Laurent LA RIZZA Mar 01 '14 at 22:17
  • Not good. I looked at the generated code, and the stack is allocated by `malloc`. So long, constructor calls... – Laurent LA RIZZA Mar 01 '14 at 22:29

1 Answers1

2

I've had the exactly same problem recently. I solved it the following way: I used char* in the union (or better, the struct I used for improved type safety), but then converted to std::string as soon as I assigned the string to my data structure.

So I have (code shortened significantly)

 struct parserType
 {
     double number;
     char* string;
     int stringLength;
     // ...
 };

And in the parser.y file

 %define api.value.type {struct parserType}
 %token <string> STRING

 // and maybe...
 %type <string> LatitudeFile
 %type <string> LongitudeFile
 %type <string> HeightFile


 // A simple non-terminal:
 LatitudeFile:
 /* Empty */
 {
      $$ = NULL;
 }
 | LATITUDE '=' STRING
 {
      $$ = ($3);
 }
 ;
 // A structure building rule:
| KEYWORD LatitudeFile LongitudeFile HeightFile GridBaseDatum
{
     ss = new DataObject();
     ss->rs.latShiftFile = ToStdString($2);
     ss->rs.lonShiftFile = ToStdString($3);
     ss->rs.heightShiftFile = ToStdString($4);
     ss->rs.gridBaseDatum = ToStdString($5);            
     $$ = ss;
 }

with

std::string ToStdString(char* ptr)
{
    std::string ret = (ptr != NULL) ? std::string(ptr + 1, (strlen(ptr) - 2)) : std::string("");
    delete[] ptr; // Delete memory allocated by lexer. 
    return ret;
}

with the following in the lexer:

 {STRING}  {
      char* buf = new char[yyleng+1];
      memset(buf, 0, yyleng+1);
      strncpy_s(buf, yyleng +1 , yytext, _TRUNCATE);
      yylval->string = buf;
      yylval->stringLength = yyleng;
      return STRING;
 }

This may not be the most elegant solution, but it seems to work flawlessly so far. If anybody knows how one can circumvent the "std::string must not be part of an union" problem, that would probably get a nicer solution.

PMF
  • 14,535
  • 3
  • 23
  • 49
  • That's not what I'm looking for. C++11 allows non-POD types in `union`s, but a `union` with non-POD types is useless without a surrounding `struct` that has a field that allows to discriminate the current "constructed object" in the `union` (to properly destroy it in case of assignment) Isn't there a way to tell flex to use a plain `struct` instead of a `union`? – Laurent LA RIZZA Mar 01 '14 at 16:21
  • I had this included in my example above: Use the line `%define api.value.type {struct parserType}` in the parser.y file to define your data exchange object between flex and bison. You can use any definition you like for it. Unfortunatelly, the given struct is used inside an union in bison itself, so this might as well not work. – PMF Mar 01 '14 at 17:08