0

I'm working on a parser for RDDL and, as I've done it before, when I define union which contains types I use, I use pointers. E.g.

%union {
    double d;
    int i;
    std::string *str;
    std::vector<std::string> *vectorStr;

    RDDLBlock *rddlBlock;
    Domain *domain;
    DefineType *defineType;
    std::vector<DefineType*> *vectorDefineType;
    DomainList *domainList;
    std::vector<PvarDefinition*> *vectorPvarDefinition;
    PvarDefinition *pVarDefinition;
    CpfDefinition *cpfDefinition;
    std::vector<CpfDefinition*> *vectorCpfDefinition;
    PvarExpression *pVarExpression;
    LogicalExpression *logicalExpression;
    std::vector<LogicalExpression*> *vectorLogicalExpression;
    LConstCaseList *lConstCaseList;
    CaseDefine *caseDefine;
    std::vector<CaseDefine*> *vectorCaseList;
    Parameter *parameter;
    ParameterList *parameterList;

    ObjectDefine *objectDefine;
    std::vector<ObjectDefine*> *objectsList;
    PvariablesInstanceDefine* pvariablesInstanceDefine;
    std::vector<PvariablesInstanceDefine*> *pvariablesInstanceList;

    Instance *instance;
    NonFluentBlock *nonFluentBlock;
}

This is the way I saw most people implement multiple token types in parsers. While searching for this answer on the web, all I saw are the examples and no explanation on why we have to use pointers. One of my tasks now is to 'clean pointers' where ever that is possible. So my question is, why do we (have to) use pointers in unions in this case?

EDIT: Added full list of types defined in union.

timrau
  • 22,578
  • 4
  • 51
  • 64
Đorđe Relić
  • 418
  • 4
  • 13

2 Answers2

4

You don't have to use pointers. As you can see, neither double nor int are pointers.

As to "why do we use" part then, we should remember some properties of union.

  1. sizeof union_t must be at least as big as the biggest member. So you don't want union with a single word int and some 1KB class by value. And pointer almost always have fixed small size.

  2. In C++ world, many classes (for your example, std::string and std::vector) have non-trivial copy constructors and destructors.

For such classes, it is unsafe to put them in union. C++11 provides a "solution" to this, known as unrestricted unions. But even then, it won't work as is: for every assignment and destruction of union_t object, you must explicitly destruct/construct an active union member.

  • Thanks for your answer! Before I mark it as correct, I'd like to ask a sub-question. Since I have both std::string and std::vector types in my union (I'll update the question with full list of types) it would be thing that depends on the compiler and hence not everyone will be able to compile it with their default compiler. The project is meant to be open source when is finished and it is important to be ready to run and compile without many additional settings. – Đorđe Relić Aug 02 '16 at 10:53
  • @ĐorđeRelić, I was wrong, it is now standard with [C++11 unrestricted union](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2544.pdf). However, there are many caveats - you need to carefully consider every copy, assignment, and destruction of `union` object. IMO it doesn't worth the effort, better leave pointers in place. –  Aug 02 '16 at 11:24
1

Non trivial objects cannot be stored in unions, so as long as you use %union, you have to use pointers. However, Bison 3 provides a variant-based alternative which frees (eh eh) you from having to use pointers.

So instead of

%union
{
  int ival;
  std::string* sval;
}
%token <ival> NUMBER;
%token <sval> STRING;

you would write

%define api.value.type variant
%token <int> NUMBER;
%token <std::string> STRING;

See the A Complete C++ Example in Bison's documentation for more details.

akim
  • 8,255
  • 3
  • 44
  • 60