Create rows of struture mapped to names externally defined

Question

We have a class like below:

NamesOfData GetNamesOfData()
{
    NamesOfData names =
    {
        "AAA",
        "BBB"
    };
  return names;
}

Now I have a structure that has to have names as mentioned above along with other fields:

struct A
{
    std::string name;
    ...other fields...;
};

I can have the name easily duplicated to create rows of strut values:

struct A Data [] =
{
     "AAA",.......;
     "BBB", ......;
};

But the issue is that I need to ensure the integrity of names within struct and externally defined - which can be broken any time if the names are modified at any place.

Is there a way design that the issue mentioned above can be overcome or rather having names centrally but easily mapped by both the places?

I first saw this when learning about [`XAtom`](https://tronche.com/gui/x/xlib/window-information/properties-and-atoms.html)s but symbol tables in compilers might do this similar: You may introduce a `std::vector` for unique storage of names bundled with a `std::map` for reverse mapping. (`const char*` instead of `std::string` as resize of vector may not reallocate them.) Hence, the `struct A` would only refer to an index in this table and it's on your own whether or not duplicates are allowed. — Scheff's Cat, Nov 30 '18 at 07:43
Thanks - can you please provide an implementation example - it would be helpful to better understand your concept. — Programmer, Nov 30 '18 at 08:00

Scheff's Cat · Accepted Answer · 2018-11-30T12:07:19.820

What I previously wrote in a comment:

I first saw this when learning about XAtoms (Yepp. That's decades ago but I still like the concept with the atom table and use it when it's appropriate) but symbol tables in compilers might do this similar:

You may introduce a std::vector<const char*> for unique storage of names bundled with a std::map<const char *, size_t> for reverse mapping of indices to names. I use const char* instead of std::string because I'm afraid of that std::vector::resize() may copy the strings and invalidate the C-string addresses which I use for reverse mapping (to prevent duplicated storage of strings).

Hence, OP's struct A could refer to an index in this table. Whether or not duplicated names are allowed is a design decision beyond this atom table.

A sample program (sketch) to demonstrate how it could look like:

#include <cstring>
#include <iostream>
#include <map>
#include <string>
#include <vector>

// atom table to store unique strings
class AtomTable {
  private:
    // storage of strings (mapping of indices to strings)
    std::vector<const char*> _names;
    // predicate for lexicographical order of names
    struct LessName {
      bool operator()(const char *name1, const char *name2) const
      {
        return strcmp(name1, name2) < 0;
      }
    };
    typedef std::map<const char*, size_t, LessName> Map;
    // mapping of strings to indices
    Map _map;

    static const char* strdup(const std::string &name)
    {
      char *name2 = new char[name.size() + 1];
      strcpy(name2, name.c_str());
      return name2;
    }

  public:
    AtomTable() = default;
    ~AtomTable()
    {
      _map.clear();
      for (const char *&name : _names) delete[] name;
    }
    AtomTable(const AtomTable&) = delete; // important to disable this
    AtomTable& operator=(const AtomTable&) = delete; // important to disable this

    size_t find(const std::string &name) const
    {
      Map::const_iterator iter = _map.find(name.c_str());
      return iter != _map.end()
        ? iter->second // index of name in table
        : (size_t)-1; // invalid index
    }

    size_t get(const std::string &name)
    {
      size_t i = find(name);
      if (i < _names.size()) return i;
      // not yet in table -> add
      i = _names.size();
      _names.push_back(strdup(name));
      _map[_names.back()] = i;
      return i;
    }

    const char* get(size_t i) const
    {
      return _names[i];
    }

    size_t size() const { return _names.size(); }
};

// a singleton atom table
static AtomTable& atomTable()
{
  // like learnt from Meyer:
  // https://stackoverflow.com/q/1661529/7478597
  static AtomTable atomTable;
  return atomTable;
}

// a sample structure
struct A {
  size_t iName;
  int payload;

  A(const std::string &name, int payload):
    iName(atomTable().get(name)), payload(payload)
  { }

  ~A() = default;

  A(const A&) = default;
  A& operator=(const A&) = default;
};

std::ostream& operator<<(std::ostream &out, const A &a)
{
  return out << "A { name: '" << atomTable().get(a.iName)
    << "' (" << a.iName << "), payload: " << a.payload << " }";
}

int main()
{
  std::vector<A> aTable = {
    { "first", 1 },
    { "second", 2 },
    { "third", 3 },
    { "first", 4 }
  };
  size_t i = 0; for (const A &a : aTable) {
    std::cout << i << ".: " << a << '\n'; ++i;
  }
  std::cout
    << "About Atom Table:\n"
    << "Number of entries: " << atomTable().size() << '\n'
    << "'first' in table?: "
    << (atomTable().find("first") < atomTable().size() ? "yes" : "no")
    << '\n'
    << "'other' in table?: "
    << (atomTable().find("other") < atomTable().size() ? "yes" : "no")
    << '\n';
  return 0;
}

Output:

0.: A { name: 'first' (0), payload: 1 }
1.: A { name: 'second' (1), payload: 2 }
2.: A { name: 'third' (2), payload: 3 }
3.: A { name: 'first' (0), payload: 4 }
About Atom Table:
Number of entries: 3
'first' in table?: yes
'other' in table?: no

Live Demon on coliru

Note:

Please, bear with me that I didn't do any error checking. This is surely something which has to be added to productive code.
A std::unordered_map might be used instead of a std::map. In any case, it's important to provide a predicate which compares the contents of the C-strings not the addresses as how it would be done with e.g. std::less in std::map.
As it is, names (atoms) are added only – they are intentionally never removed before destruction of AtomTable. If removal would be required, I would add some kind of reference counting as well. One of the advantage of Atoms (beside of the fact that comparing names is very cheap – an integer comparison) is that they can be shared with minimal footprint (an index). Due to this, removing them would be dangerous without proper reference counting. (Not to mention, that managing the unused indices in name storage would become a topic as well.)

Rather than having owning raw `char` pointers, you could use a sequence container that doesn't invalidate references to elements when it is expanded, such as `std::deque` — Caleth, Nov 30 '18 at 12:02
@Caleth Nice idea. Aside from this, I was thinking whether the addresses of `const char*` could be abused as index (i.e. a `std::set` instead of `std::map` and no `std::vector` anymore). They should be unique as well. There is probably much room for improvement/optimization. For my luck, I called it a _sketch_. ;-) — Scheff's Cat, Nov 30 '18 at 12:05

Create rows of struture mapped to names externally defined

1 Answers1