13

I'm working on a message parser/generator subsystem. I'm creating an auto-generator that uses a database that contains all of the information about this protocol, including enum lists, to generate the code. One thing I came across is the need for hierarchical enumerations.

updated

(I was trying to simplify things by not describing the full problem, but the comments below make it obvious that I erred by simplifying too much.)

The Database being used will store things as simplified strings (customer decision), but the protocol only speaks "byte triplets" (aka Hierarchical Enum). The full problem could be described as such:

Given a set of unique strings that each correspond with a unique triplet, 1) find the triplet for any given string, and 2) find the string for any given triplet. Make sure to account for "Undefined" and "No Statement" enumerations (which do not have strings associated with them). [As one poster noted, yes it is insane.]

(Caveat: I've been doing C++ for well over a decade, but I've been doing Java this last year -- my C++ is probably "corrupted".)

So, to use an admittedly contrived example, given:

// There is only one category
// POP= "P", COUNTRY= "K", CLASSICAL= "C"
enum Category {POP, COUNTRY, CLASSICAL};

// There is one Type enum for each Category.
// ROCK= "R", BIG_BAND = "B", COUNTRY_POP= "C" 
enum PopType {ROCK, BIG_BAND, COUNTRY_POP};
enum CountryType {CLASSICAL_COUNTRY, MODERN_COUNTRY, BLUEGRASS, COUNTRY_AND_WESTERN};
// ...

// There is one Subtype for each Type
// EIGHTIES= "E", HEAVY_METAL= "H", SOFT_ROCK= "S"
enum RockSubType { EIGHTIES, HEAVY_METAL, SOFT_ROCK};
// ...

When I get 0, 0, 0 (Pop, Rock, Eighties), I need to translate that to "PRE". Conversely, if I see "PC" in the Database, that needs to be sent out the wire as 0, 2 (Pop, Country, NULL).

I'm blatantly ignoring "Undefined" and No Statement" at this point. Generating a triplet from a string seems straight forward (use an unordered map, string to triple). Generating a string from a triplet (that may contain a NULL in the last entry) ... not so much. Most of the "enum tricks" that I know won't work: for instance, Types repeat values -- each Type enum starts at zero -- so I can't index an array based on the Enum value to grab the string.

What's got me is the relationship. At first glance it appears to be a fairly straight forward "is-a" relationship, but that doesn't work because this case is bidirectional. The leaf -> root navigation is very straight forward, and would be appropriate for a class hierarchy; unfortunately, going the other way is not so straight forward.

I cannot "hand roll" this -- I have to generate the code -- so that probably eliminates any XML based solutions. It also has to be "reasonably fast". The "Java Solution" involves using protected static variables, initialized on construction, and abstract base classes; however, I do not believe this would work in C++ (order of initialization, etc.). Plus, aesthetically, I feel this should be ... more "const". Other code I've seen that tackles this problem uses unions, explicitly listing all of the enum types in the union.

The only other thing I can come up with is using Template Specialization and explicit specialization, but I'm at a loss. I did a web search on this, but I found nothing that would tell me if it would even work. Still, if it can be done with a union, can't it be done with Template Specialization?

Is it possible to do something like this using templates, specialization, explicit specialization? Is there another, more obvious, solution (i.e. a design pattern that I've forgotten) that I'm missing?

Oh, before I forget -- the solution must be portable. More specifically, it must work on Windows (Visual Studio 2010) and Redhat Enterprise 6/Centos 6 (GCC 4.4.4 IIRC).

And, lest I forget, this protocol is huge. The theoretical max on this is about 133,000 entries; once I include "Undefined" and "No Statement" I'll probably have that many entries.

Thanks.

John Price
  • 556
  • 1
  • 6
  • 18
  • It isn't clear what problems you are trying to solve with the representation. I gather that types need to query their subtypes, and vice versa. Is that it? – Tom Kerr Nov 14 '11 at 18:24
  • Basically, yes. In essence, I need to be able to select an enum list based on the previous enum list in the hierarchy. The numbers involved make it too difficult to force the end user of this code to "just know" which enum listing is valid (i.e. 32 Type enums, one for each Category; probably over a thousand enum classes for the subtypes, but less than the 2048 max). The numbers themselves come and go as a double or triple; making them meaningful is the problem. Does that help? – John Price Nov 14 '11 at 18:40
  • Do you require the enumerations to be type-safe? In other words, do they have to actually be enums? – Ben Collins Nov 14 '11 at 18:56
  • No. The basic requirement is for the end users (Software Engineers reasonably proficient in C++) to easily translate the byte triple into something more meaningful than raw numbers. – John Price Nov 14 '11 at 19:15
  • Hopefully my update is more clear than the original question. I detailed the full problem rather than just the part I'm trying to solve. – John Price Nov 15 '11 at 20:44

6 Answers6

9

Effectively, you are in a bit of a pinch here.

My proposal would imply first using 3 enums:

  • Category
  • Type
  • SubType

With no distinction (at first) between the various types or subtypes (we just throw them all in the same basket).

Then, I would simply use a structure:

struct MusicType {
  Category category;
  Type type;
  SubType subtype;
};

And define a simple set of valid types:

struct MusicTypeLess {
  bool operator()(MusicType const& left, MusicType const& right) const {
    if (left.category < right.category) { return true; }
    if (left.category > right.category) { return false; }

    if (left.type < right.type) { return true; }
    if (left.type > right.type) { return false; }

    return left.subtype < right.subtype;
  }
};

MusicType MusicTypes[] = {
  { Category::Pop, Type::Rock, SubType::EightiesRock },
  ...
};

// Sort it on initialization or define in sorted during generation

Then you can define simple queries:

typedef std::pair<MusicType const*, MusicType const*> MusicTypeRange;

MusicTypeRange listAll() {
  return MusicTypeRange(MusicTypes, MusicTypes + size(MusicTypes));
}

namespace {
  struct MusicTypeCategorySearch {
    bool operator()(MusicType const& left, MusicType const& right) const {
      return left.category < right.category;
    }
  };
}

MusicTypeRange searchByCategory(Category cat) {
  MusicType const search = { cat, /* doesn't matter */ };
  return std::equal_range(MusicTypes,
                          MusicTypes + size(MusicTypes),
                          search,
                          MusicTypeCategorySearch());
}

namespace {
  struct MusicTypeTypeSearch {
    bool operator()(MusicType const& left, MusicType const& right) const {
      if (left.category < right.category) { return true; }
      if (left.category > right.category) { return false; }

      return left.type < right.type;
    }
  };
}

MusicTypeRange searchByType(Category cat, Type type) {
  MusicType const search = { cat, type, /* doesn't matter */ };
  return std::equal_range(MusicTypes,
                          MusicTypes + size(MusicTypes),
                          search,
                          MusicTypeTypeSearch ());
}

// little supplement :)
bool exists(MusicType const& mt) {
  return std::binary_search(MusicTypes, MusicTypes + size(MusicTypes), mt);
}

Because the array is sorted, the operations are fast (log N), so it should go smoothly.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • I _love_ this solution. I'm going to try this, but I don't know if it will work for me because of size (see my above edit). 133,000+ entries in the array, plus two small string constants for each entry may be too much for a single compile unit. – John Price Nov 14 '11 at 18:31
  • +1, I was looking for a solution initially with 3 classes but landed up folding the classes into 1 class tracking all 3 enums. You solution was much better than mine so I didn't bother posting. Well written. – Joel Nov 14 '11 at 18:35
1

I think the Music class should contain the sub genres...(has-a) also called aggregation.

Steve Wellens
  • 20,506
  • 2
  • 28
  • 69
  • Thanks for this response, I'll edit my original to make it more clear. I'm afraid the protocol is too large to consider that, or at least to consider the naive approach to aggregation. There are hundreds of subclasses to deal with in this particular protocol ,and a given sub-class may have more than a thousand enum entries. I guess that's why I was hoping to do a sort of "faux aggregation". – John Price Nov 14 '11 at 18:08
1

The leaf -> root navigation is very straight forward, and would be appropriate for a class hierarchy; unfortunately, going the other way is not so straight forward.

I'm not really sure what value you're getting by using enums in the first place. Are there compelling reasons not just invent a Category class, and then connect together instances of them to model what you're trying to achieve? (I'm reminded of the Qt State Machine Framework...)

In my mind, the good thing about it is how simple it is, and easy to adapt as your needs change. It's boring code. You're not really pushing the compile-time features of the language much. But you say this is generated code, so don't really have to worry about someone introducing bugs with a cyclic category heirarchy. Just make sure such things aren't generated.

UPDATE Okay I read your scenario updates and it really sounds like you're looking at a database task here. The word "enum" doesn't even come to mind for this. Have you considered SQLite?

http://en.wikipedia.org/wiki/SQLite

Still, putting aside the question of where you're getting this insane list of 133,000 music genres, I have modified my code to give you a concrete performance metric for how C++ can handle runtime objects of that scale. You'll max things out eventually, but on most machines it can still be fairly snappy...try it:

#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
#include <cstdlib>
using namespace std;

class Category {
private:
    string name;
    Category* parent;
    set<Category*> children;
private:
    static set<Category*> allCategories;
    static vector<Category*>* allCategoriesVector;
public:
    Category (string name, Category* parent) :
        name (name), parent (NULL)
    {
        resetParent(parent);
    }
    void resetParent(Category* newParent) {
        if (parent) {
            parent->children.erase(this);
            if (newParent == NULL) {
                allCategories.erase(this);
                if (allCategoriesVector != NULL) {
                    delete allCategoriesVector;
                    allCategoriesVector = NULL;
                }
            }
        } else {
            if (newParent != NULL) {
                allCategories.insert(this);
                if (allCategoriesVector != NULL) {
                    allCategoriesVector->push_back(this);
                }
            }
        }
        set<Category*>::iterator i = children.begin();
        while (i != children.end()) {
            (*i)->parent = NULL;
            i++;
        } 

        if (newParent) {
            newParent->children.insert(this);
        }

        parent = newParent;
    }
    Category* getRoot() {
       Category* result = this;
       while (result->parent != NULL) {
           result = result->parent;
       }
       return result;
    }
    const string& getNamePart() const {
        return name;
    }
    string getNamePath() const {
        if (parent) {
            return parent->getNamePath() + ":" + getNamePart();
        } else {
            return getNamePart();
        }
    }
    static const vector<Category*>& getAllCategoriesVector() {
        if (allCategoriesVector == NULL) {
           allCategoriesVector = new vector<Category*> (
               allCategories.begin(), allCategories.end()
           );
        }
        return *allCategoriesVector;
    }
    static Category* randomCategory() {
        if (allCategories.empty())
            return NULL;

        // kids: don't try this at home if you want a uniform distribution
        // http://stackoverflow.com/questions/5008804/generating-random-integer-from-a-range
        return getAllCategoriesVector()[rand() % allCategories.size()];
    }
    virtual ~Category() {
        resetParent(NULL);
    }
};
set<Category*> Category::allCategories;
vector<Category*>* Category::allCategoriesVector = NULL;

class CategoryManager {
public:
    Category Root;
        Category Pop;
            Category Rock;
                Category EightiesRock;
                Category HeavyMetal;
                Category SoftRock;
            Category CountryPop;
            Category BigBand;
        Category Country;
        Category Classical;
        Category Jazz;

private:
    set<Category*> moreCategories;
public:
    CategoryManager (int numRandomCategories = 0) :
        Root ("Category", NULL),
            Pop ("Pop", &Root),
                Rock ("Rock", &Pop),
                    EightiesRock ("EightiesRock", &Rock),
                    HeavyMetal ("HeavyMetal", &Rock),
                    SoftRock ("SoftRock", &Rock),
                CountryPop ("CountryPop", &Pop),
                BigBand ("BigBand", &Pop),
            Country ("Country", &Root),
            Classical ("Classical", &Root),
            Jazz ("Jazz", &Root)
    {
        // claim is that there are "hundreds" of these
        // lets make a bunch of them starting with no parent
        for (int i = 0; i < numRandomCategories; i++) {
            stringstream nameStream;
            nameStream << "RandomCategory" << i;
            moreCategories.insert(new Category(nameStream.str(), NULL));
        }

        // now that we have all the categories created, let's
        // reset their parents to something chosen randomly but
        // keep looking until we find one whose path goes up to Root
        set<Category*>::iterator i (moreCategories.begin());
        while (i != moreCategories.end()) {
            (*i)->resetParent(Category::randomCategory());
            i++;
        }
    }
    virtual ~CategoryManager () {
        set<Category*>::iterator i = moreCategories.begin();
        while (i != moreCategories.end()) {
            delete *i;
            i++;
        }
    }
};

int main() {
    CategoryManager cm (133000);

    // how to get to a named category
    cout << cm.EightiesRock.getNamePath() << "\n" << "\n";

    // pick some random categories to output
    for (int i = 0; i < 5; i++) {
        cout << Category::randomCategory()->getNamePath() << "\n";
    }

    return 0;
}

On my machine this rather promptly spat out:

Category:Pop:Rock:EightiesRock

Category:Pop:Rock:HeavyMetal:RandomCategory0:RandomCategory6:RandomCategory12:RandomCategory95:RandomCategory116:RandomCategory320:RandomCategory358:RandomCategory1728:RandomCategory6206:RandomCategory126075
Category:Country:RandomCategory80:RandomCategory766:RandomCategory2174
Category:Country:RandomCategory22:RandomCategory45:RandomCategory52:RandomCategory83:RandomCategory430:RandomCategory790:RandomCategory860:RandomCategory1628:RandomCategory1774:RandomCategory4136:RandomCategory10710:RandomCategory13124:RandomCategory19856:RandomCategory20810:RandomCategory43133
Category:Pop:Rock:HeavyMetal:RandomCategory0:RandomCategory5:RandomCategory138:RandomCategory142:RandomCategory752:RandomCategory2914:RandomCategory9516:RandomCategory13211:RandomCategory97800
Category:Pop:CountryPop:RandomCategory25:RandomCategory63:RandomCategory89:RandomCategory2895:RandomCategory3842:RandomCategory5735:RandomCategory48119:RandomCategory76663

I'll still say a database is the answer you're looking for here, but at the same time you'd be surprised how much abuse a compiler will take these days. 133K file with each line being an object declaration is more tractable than it sounds.

  • @JohnPrice I've updated my answer to respond to some of the updates to your question. This does sound to me like a case where you probably want to take a very close look at the requirements and not over-engineer a solution. Doing generative programming so you can produce C++ objects from a table (that people are only going to do light querying of in a GUI app) doesn't sound very sensible. See: http://www.sqlite.org/famous.html – HostileFork says dont trust SE Nov 14 '11 at 21:15
  • Like I said, the example is contrived; but "insane" describes this protocol rather well. I mean, when you have an "enumeration" that describes multiple values (e.g. UNDEFINED 2-63) you _know_ you're going to have problems coding it up. – John Price Nov 15 '11 at 13:40
0

Your lookups are runtime, so I don't really think a lot of static typing will help you much. I believe you could write them on top of the below if you really wanted them as well.

I don't assume that programmers will be directly specifying these in their day to day coding. They will be taking in runtime generated values and transforming it?

Given that assumption, I would denormalize the enum. This may have some trade offs for getting warnings about when a switch statement is missing one of the values.

struct MusicType {
  enum EnumValue {
    ROOT = 0
    ,Pop
    ,Pop_Rock
    ,Pop_Rock_EightiesRock
    ,Pop_Rock_HeavyMetal
    ,Pop_Rock_SoftRock
    ,Pop_CountryPop
    ,Pop_BigBand
    ,Country
    ,Classical
    ,Jazz
  };
  std::string getLeafString(EnumValue ev) {
    case (ev) {
      case Pop:         return "Pop";
      case Pop_Rock:    return "Rock";
      // ...
      default:
        throw std::runtime_error("Invalid MusicType (getLeafString)");
    }
  }
  // you could write code to do this easily without generating it too
  std::string getFullString(EnumValue ev) {
    case (ev) {
      case Pop:         return "Pop";
      case Pop_Rock:    return "Pop::Rock";
      // ...
      default:
        throw std::runtime_error("Invalid MusicType (getFullString)");
    }
  }

};

So then you need to map your relationships. It sounds like the number of levels is firm, but when that sort of assumption breaks it's really expensive to fix.

There are a few ways to go about this. I think a data structure is the most straight forward to implement, though you could do a big huge switch. I think that would be more trouble for similar performance. Really, a switch statement is just a map in the code segment though, pick your poison.

I like to solve problems like this that only resolve one level at a time. This lets you have any number of levels. It makes this lowest level of abstraction simpler. It does make you write more "middleware," but that should be simpler to implement.

void getChildren(MusicType::EnumValue ev, std::vector<MusicType::EnumValue> &children) {
  typedef std::multimap<MusicType::EnumValue, MusicType::EnumValue> relationships_t;
  typedef std::pair<MusicType::EnumValue, MusicType::EnumValue> mpair_t;
  static relationships_t relationships;
  static bool loaded = false;
  if (!loaded) {
    relationships.insert(mpair_t(MusicType::Pop, MusicType::Pop_Rock));
    relationships.insert(mpair_t(MusicType::Pop_Rock, MusicType::Pop_Rock_EightiesRock));
    // ..
  }
  // returning these iterators as a pair might be a more general interface
  relationships::iterator cur = relationships.lower_bound(ev);
  relationships::iterator end = relationships.upper_bound(ev);
  for (; cur != end; cur++) {
    children.push_back(cur->second);
  }
} 

MusicType::EnumValue getParent(MusicType::EnumValue ev) {
  case (ev) {
    case Pop:         return MusicType::ROOT;
    case Pop_Rock:    return MusicType::Pop;
    // ...
    default:
      throw std::runtime_error("Invalid MusicType (getParent)");
    }
}

The great part about separating it like this is that you can write any sort of combinatorial helpers you want for these without having to worry about structure too much.

For GUI feedback, this should be fast enough. If you needed it faster, then you may be able to do some inversion of control to avoid a few copies. I don't think I'd start there though.

You can add extra functionality without changing too much internally, which is my main concern with generated code usually. The open/closed principle is extremely important with generated code.

Tom Kerr
  • 10,444
  • 2
  • 30
  • 46
0

I'm having trouble understanding your intent, but here's a random shot in the dark. MusicCategory is a class that holds the value in Enum value. PopTypes inherits publicly from MusicCategory, and so does RockTypes from PopTypes. As long as the program only stores/passes MusicCategory types, you can assign all types to it, from any of the derived class types. Thus you can have MusicCategory Cat = RockTypes::SoftRock;, and if the enums are defined carefully, it would even set Pop/Rock appropriately.

struct MusicCategory{
   enum Enum {
              NoCategory = 0 | (0<<12),  //"0 |" isn't needed, but shows pattern
              Pop        = 0 | (1<<12), 
              Country    = 0 | (2<<12), 
              Classical  = 0 | (3<<12), 
              Jazz       = 0 | (4<<12),
              All        = INT_MAX} value; 
  //"ALL" forces enum to be big enough for subtypes
   MusicCategory(Enum e) :value(e) {} //this makes the magic work
   operator Enum&() {return value;}
   operator const Enum&() const {return value;}
   operator const int() const {return value;}
   const std::string & getString(MusicCategory::Enum category);
};

// Begin types
// This one is a subtype of MusicCategory::Pop
struct PopTypes : public MusicCategory {
   enum Enum { 
       NoType     = MusicCategory::Pop | (0<<6), 
       Rock       = MusicCategory::Pop | (1<<6), 
       CountryPop = MusicCategory::Pop | (2<<6), 
       BigBand    = MusicCategory::Pop | (3<<6),
       All        = INT_MAX};
   const std::string & getString(PopTypes::Enum category);
};
// ...

// Begin subtypes
struct RockTypes : public PopType {
   enum Enum { 
       NoSubType    = PopTypes::Rock | (0<<0),  //"<<0)" isn't needed, but shows pattern
       EightiesRock = PopTypes::Rock | (1<<0),
       HeavyMetal   = PopTypes::Rock | (2<<0), 
       SoftRock     = PopTypes::Rock | (3<<0),
       All          = INT_MAX};
   const std::string & getString(RockTypes::Enum category);
};

int main() {
    MusicCategory Cat; 
    // convertable to and from an int
    Cat = RockTypes::HeavyMetal;
    //automatically sets MusicCategory::Pop and PopTypes::Rock
    bool is_pop = (Cat & MusicCategory::Pop == MusicCategory::Pop);
    //returns true
    std:string str = MusicCategory::getString(Cat);
    //returns Pop
    str = PopTypes::getString(Cat);
    //returns Rock
    str = RockTypes::getString(Cat);
    //returns HeavyMetal
}
Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
0

First, thanks to everyone for their help. I wasn't actually able to use any of the answers "as is" because of the nature of this problem:

  • enums repeat their values (every enum can have the same numerical values as it's siblings, but with a different label and "meaning")
  • strings associated with the enum can be repeated as well (a given enum can have the same string as a sibling, but with a different meaning).

I eventually found Boost bimaps and it turns out that a bimap hierarchy works well for this problem. For those that haven't seen them, Boost `bimap' is a bidirectional container that uses either of the pair as key and the other as value.

I can make a bimap of "integer, string" (uint8_t in this case, since the enums here are all guaranteed to be small) and add the, errr, "sub-enum", as information associated with the bimap using with_info.

The hierarchy code looks something like this:

// Tags
struct category_enum_value {};
struct type_enum_value {};
struct subtype_enum_value {};
struct category_string {};
struct music_type_string {};
struct music_subtype_string {};
struct music_type_info {};
struct music_subtype_info {};

// Typedefs
typedef bimap<
    unordered_set_of< tagged<uint8_t, subtype_enum_value> >,
    unordered_set_of< tagged<std::string, music_subtype_string> >
> music_subtype;
typedef music_subtype::value_type music_subtype_value;

typedef bimap<
    unordered_set_of< tagged<uint8_t, type_enum_value> >,
    unordered_set_of< tagged<std::string, music_type_string> >,
    with_info< tagged<music_subtype, music_subtype_info> >
> music_type_type;
typedef music_type_type::value_type music_type_value;

typedef bimap<
    unordered_set_of< tagged<uint8_t, category_enum_value> >,
    unordered_set_of< tagged<std::string, category_string> >,
    with_info< tagged<music_type_type, music_type_info> > 
> category_type;
typedef category_type::value_type category_value;

I chose unordered_set for performance reasons. Since this is strictly a "constant" hierarchy, I don't have to worry about insertion and deletion times. And because I'll never be comparing order, I don't have to worry about sorting.

To get the Category information by enum value (get string values when given the enum), I use the category_enum_value tag:

    category_type::map_by<category_enum_value>::iterator cat_it = categories.by<category_enum_value>().find(category);
if(cat_it != categories.by<category_enum_value>().end())
{
    const std::string &categoryString = cat_it->get_right();
            // ...

I get the appropriate Type information from this by doing this, using the type_enum_value tag (subtype is nearly identical):

    music_type_type &music_type_reference = cat_it->get<music_type_info>();
    music_type_type::map_by<type_enum_value>::iterator type_it = music_type_reference.by<type_enum_value>().find(type);
    if(type_it != music_type_reference.by<type_enum_value>().end())
    {
               // ... second verse, same as the first ...

To get the enum values given the string, change the tag to category_string and use similar methods as before:

    std::string charToFind = stringToFind.substr(0, 1);
    category_type::map_by<category_string>::iterator cat_it = categories.by<category_string>().find(charToFind);
    if(cat_it != categories.by<category_string>().end())
    {
        retval.first = cat_it->get_left();
                    // ... and the beat goes on ...

Any additional information that I need for any given level (say, menu item strings) can be added by changing the info type from a bimap to a struct containing a bimap and whatever information I might need.

Since this is all constant values, I can do all the hard work "up front" and design simple look-up functions -- O(1) -- to get what I need.

John Price
  • 556
  • 1
  • 6
  • 18