4

Conceptually my data represents a table like this one

+-------+------+------+------+------+  
|  ID   | Val1 | Val2 | Val3 | Val4 |  
+-------+------+------+------+------+  
| Name1 | a    | b    | c    | d    |  
| Name2 | a    | b    | c    | e    |  
| Name3 | h    | b    | c    | d    |  
| Name4 | i    | j    | k    | l    |  
+-------+------+------+------+------+  

Currently I am free to choose how that data will be stored. Essentially I have IDs with some values, which are assigned to the IDs. A set of values is unique to the ID (no set will be repeated for another ID). Individual values can be repeated across IDs.

Now, what I need to do in C++ is to find the name by using a set of Values. So essentially something like this:

std::string findID(char val1, char val2, char val3, char val4)
{
    //Do something to find the corresponding Name

    return result;
}

I am currently a bit bumbed on how to go about this. I thought about using a map with tuples, but providing a hash function for the tuples seems like overcomplicating the problem. So basically like described here: Using tuple in unordered_map

Is there a simpler solution that I just seem to overlook?

FreddyKay
  • 41
  • 4
  • Are values actually of type `char`? – Bartek Banachewicz Jan 24 '18 at 15:08
  • how do you store the data? Some code would really help to understand the problem – 463035818_is_not_an_ai Jan 24 '18 at 15:08
  • 2
    What do you mean by "table"? What do you *really* have? Please take some time to [read about how to ask good questions](http://stackoverflow.com/help/how-to-ask), and learn how to create a [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve) of your current attempt. – Some programmer dude Jan 24 '18 at 15:08
  • [Boost Bimap](http://www.boost.org/doc/libs/1_66_0/libs/bimap/doc/html/index.html), maybe? – Fred Larson Jan 24 '18 at 15:09
  • @Someprogrammerdude I edited the question. As I am asking whether to use a map of tuples or something else, I do not have a solution of how to store the data yet. At least none, that seems suitable yet. – FreddyKay Jan 24 '18 at 15:14
  • 1
    @BartekBanachewicz If it makes a difference, it will be a primitive numeric type, currently I am working with chars, as all values I have seem so far fit into a char. – FreddyKay Jan 24 '18 at 15:16
  • 1
    This is the [XY Problem](http://xyproblem.info/). – Justin Randall Jan 24 '18 at 15:18
  • In that case I would say that the question is less about finding specific elements, but more about how to create such a "table". And that really depends on your use-cases... What are you going to use this "table" for? Where are you getting the data from? What is the real problem this table is supposed to solve? Once you know all this, and manage to get a suitable container (standard or custom) to work as you want, then finding specific elements should be pretty straightforward and probably very trivial. – Some programmer dude Jan 24 '18 at 15:18
  • @Someprogrammerdude That is why I am saying I am bumped on how to go about this. The matter of fact is that I am provided a function like above, which expects me to return the corresponding ID. Currently this table is in Excel and will not be available at runtime. I have a hard time coming up with a non convoluted way of storing and retrieving the data (like using tuples and maps). And I am asking whether I am just too blind to see the trivial solution. – FreddyKay Jan 24 '18 at 15:36

4 Answers4

5

If all values are characters, I think you'd better transform each line of your table into strings and provide a hash table for the couple ID/stringID. When you get 5 values as char to test you then just have to concat them and run your search.

Benjamin Barrois
  • 2,566
  • 13
  • 30
  • Even if not all vals are chars; I would go with hashing it ... the lookup gains of using std::map are significant enough that the cost of hashing will go away for large data sets. – UKMonkey Jan 24 '18 at 15:21
  • @YSC I don't understand this comment, `unordered_map` doesn't need a user provided hash either and it's way faster. – Nir Friedman Jan 24 '18 at 15:23
  • I have not talked about user-provided hash function, ofc it is better to use map than reinventing fire. – Benjamin Barrois Jan 24 '18 at 15:25
  • @YSC `map` never uses a hash function though. I just don't understand your comment at all then. – Nir Friedman Jan 24 '18 at 15:25
  • @YSC No, it's not unneeded, because he said "hash table" not "hash" or "hash function". `unordered_map` is a hash table. That comes with its own hash function, for common types, including string. – Nir Friedman Jan 24 '18 at 15:30
2

Yes, the easiest way is to store a mapping from a tuple of values to a name. If your values are chars, however, it's the easiest to simply use string for that, and pack everything into std::unordered_map<std::string, std::string>, the key being all values packed into the string, and the value your name.

If your values aren't chars, it's still not that complex. Instead of using a tuple, you might just use a structure:

struct Key {
    std::array<Value, n> vals;
};

And then provide std::hash instance that'd fold the array with boost::hash_combine:

class KeyHash  {
    std::size_t operator()(Key const& k)
    {
        std::size_t h = 42;
        for (Value const& v : k.vals) {
            boost::hash_combine(h, v);
        }
        return h;
    }
};

The question you've linked and the presented solution aren't bad either, though, and are generic enough to be put into some reusable library.

Bartek Banachewicz
  • 38,596
  • 7
  • 91
  • 135
  • As a side note, I'm a big fan of composite indices, [and wrote a blog post](http://blog.banachewicz.pl/2015/01/29/unorthodox-index-types.html) quite a while back that explains the issues and solutions, including the mentioned `hash_combine`. – Bartek Banachewicz Jan 24 '18 at 15:13
  • I don't think there's really any point mentioning `std::tuple` in your post; even if you don't use a string, all of the keys in the index are the same type, so `array` is enough and much easier to use, and then later you transition in `array` anyhow. – Nir Friedman Jan 24 '18 at 15:27
1

It appears that you have a 1 to many relationship in your data table, knowing that we can create a simple class that represents an arbitrary row. From there we can create another simple class that takes a row from its constructor and it also has an addRow method to add additional rows. There are no limits on the rows. The table is confined to the size of the first row passed in. So if row1 has 3 elements and is passed into table1's constructor, every other row that is added to this instantiated table must be of the same types and size. These sets of classes are template for both the ID & Types. Here are the basic declarations - definitions of these classes' structures.

template<class ID, class Type>
class TableRow {
private:
    ID id_;
    std::vector<Type> values_;

public:
    template<class... Params>
    TableRow( const ID& id, Params&&... valuePack ) :
        id_( id ),
        values_ { std::forward<Params>( valuePack )... }
    {}

    ID getID() const {
        return id_;
    }

    std::vector<Type> getValues() const {
        return values_;
    }

    std::size_t getSize() const {
        return values_.size();
    }

};

template<class ID, class Type>
class Table {
private:
    std::size_t rowSize_;
    std::vector<TableRow<ID, Type>> table_;

public:
    explicit Table( TableRow<ID, Type> row ) {
        table_.push_back( row );
        rowSize_ = row.getSize();
    }

    void addRow( TableRow<ID, Type> row ) {
        // Check to see if row's size == our table's first index size
        if ( row.getSize() == rowSize_ ) {
            // This row's size is a match and this row of data is compatabile with our current table
            table_.push_back( row );
        } else {
            std::ostringstream strStream;
            strStream << __FUNCTION__ << " row passed in does not match size of this table's row size." << std::endl;
            throw std::exception( strStream.str().c_str() );
        }
    }

    // methods to retrieve a row, an id, or a specific element to an id, 
    // comparison operators, ostream friend operators etc. here...
};

And here is an example of instantiating them...

int main() {

    try {
        std::string id = std::string( "John" );
        std::string val1 = std::string( "a" ), val2 = std::string( "b" );
        std::string val3 = std::string( "c" ), val4 = std::string( "d" );
        TableRow<std::string, std::string> tr1 { id, val1, val2, val3, val4 };

        Table<std::string, std::string> table( tr1 );

        TableRow<std::string, std::string> tr2( "Mike", "e", "f", "g", "h" );
        table.addRow( tr2 );

        TableRow<std::string, std::string> tr3( "Susan", "a", "b", "c" );
        //table.addRow( tr3 ); // exception thrown

        TableRow<unsigned, float> trf1( 0, 2.3f, 4.5f );
        TableRow<unsigned, float> trf2( 1, 4.5f, 7.8f );
        Table<unsigned, float> table2( trf1 );
        table2.addRow( trf2 );
        TableRow<unsigned, float> trf3( 2, 3.5f, 8.7f, 9.2f, 4.8f );

        //table2.addRow( trf3 ); // exception thrown

        //table.addRow( trf3 ); // will not compile - mismatched TYPES  

    } catch ( std::exception e ) {
        std::cout << e.what() << std::endl;

        std::cout << "\nPress any key and enter to quit." << std::endl;
        char q;
        std::cin >> q;

        return -1;
    }

    std::cout << "\nPress any key and enter to quit." << std::endl;
    char q;
    std::cin >> q;
    return 0;
}

If you uncomment either line that says an exception will be thrown you can see how the table catches the mismatch in sizes of different rows as they must be equal in length.

As for the functions to compare, find, display this table; I'll leave that as a continuing exercise.

However I can give you some guidance. Now as for finding an ID based on an entire row or set of data; this should be fairly simple. You would have to compare one vector with another and then return that ID.

However there is a problem with this... Here's an example of the problem...

IDs | val1, val1, val3, val4
 1  |  a     b      c     d
 2  |  e     f      g     h
 3  |  a     b      c     d

Here if we are searching this table that has a set with (a,b,c,d). There are 2 possible solutions. So you would not be able to directly return a single answer or a single ID as this is an X|Y problem, however because we have nice vectors instead of returning a single answer in this situation you can return a set of answers a container of valid rows or row - ids, but to do this you would have to search through the entire table and for each one that is a match you would have to push that into a temporary container until all matches are found, then return that container back, if there is only a single match you would still be returning a container but it would only contain a single element as it will only be a set of 1.

However you did mention this:

Currently I am free to choose how that data will be stored. Essentially I have IDs with some values, which are assigned to the IDs. A set of values is unique to the ID (no set will be repeated for another ID). Individual values can be repeated across IDs.

So if the data that already exists does not exhibit this problem, then it should not be of a concern.

So your search would then be something like this from the Table's class

template<class ID, class Type>
ID Table<ID, Type>::find( const std::vector<Type>& data ) {
    for ( auto& row : table_ ) {
        if ( data == row.getValues() ) {
            return row.getId();
        } else {
            std::ostringstream strStream;
            strStream << __FUNCTION__ << " could not find matching set." << std::endl;
            throw std::exception( strStream.str().c_str() );
        }
    }
}

As for the classes above, you can even expand on them to increase the table's size if needed.

You would have to add an addItem or addElement method to the TableRow class template, and then in the Table class template you would have to add an addColumn method, there would be two tricks with the addColumn method. The first is that it would be variadic like TableRow's constructor but the size of the elements would have to be equal to the amount of rows in the table. The other trick would be having to append each element in the parameter pack to each individual row within the table, but this function would also have to update the row's size which shouldn't be all that hard to do.

Another thing to consider would be is if each element in the list of data that belongs to an ID are not of the same type, these class's would have to be modified a little bit, instead of using std::vector<T> you could use std::tuple<...> and from there compare two tuples.

Francis Cugler
  • 7,788
  • 2
  • 28
  • 59
0

As an alternative, you can go with std::vector<std::tuple<std::string, char, char, char, char>>:

struct somestruct
{    
    std::vector<std::tuple<std::string, char, char, char, char>> _data;

    std::string findID(char val1, char val2, char val3, char val4)
    {
        auto it = std::find(begin(_data), end(_data) [](const auto& item) {
            return item.get<1>() == val1 && /*...*/;
        });
        return it->get<0>();
    }
};
YSC
  • 38,212
  • 9
  • 96
  • 149
  • Why is it downvoted? I appreciate feedback, both good and bad. – YSC Jan 24 '18 at 15:29
  • Because this not any simpler than other solutions and is O(N), mostly. And if you are already going to declare classes it would make way more sense to declare a class instead of where the tuple is used, making the code clearer. The outside class doesn't contribute anything to the problem at all. – Nir Friedman Jan 24 '18 at 15:32
  • @NirFriedman Don't forget than SO encourage multiple answers; the fact than other answers are better than this one (note the _"as an alternative"_ opening) should not be a reason to downvote, though this is your final prerogative. About the outside class... it let me make `_data` a member instead of a global. It is a subjective choice, it's mine though. – YSC Jan 24 '18 at 15:36
  • This is actually something I should've thought of too. This completely avoids hashing. Cheers. – FreddyKay Jan 24 '18 at 15:38