0

Hello :) I am implementing some graph where vertices are strings. I do many things with them, so using strings would be highly ineffective. That is why I am using indexes, simple ints. But although the rest of the class works pretty fast, I have trouble with the part I copied below. I've read somewhere that unordered_map needs some hash function, should I add it? If yes, how? The code below contains EVERYTHING that I am doing with the unordered_map.

Thank you in advance for help :)

class Graph
{
private:
    unordered_map <string, int> indexes_of_vertices;
    int number_of_vertices;
    int index_counter;

    int get_index(string vertex)
    {
        if (indexes_of_vertices.count(vertex) == 0) // they key is missing yet
        {
            indexes_of_vertices[vertex] = index_counter;
            return index_counter++;
        }
        else
            return indexes_of_vertices[vertex];
    }

public:
    Graph(int number_of_vertices)
    {
        this->number_of_vertices = number_of_vertices;
        index_counter = 0;
    }
};
  • What *exactly* is the problem? Just performance? Also, the assignment to `indexes_of_vertices` in the constructor is totally unnecessary. – John Zwinck Aug 26 '14 at 11:34
  • Yes, just the performance. This is the part where my program works very slow... :( –  Aug 26 '14 at 11:35
  • I'm pretty sure that there is already a hash function defined for std::string so there's no need to define a hash function – Simon Aug 26 '14 at 11:38
  • Hmm... so what could be so slow here? Maybe I shouldn't check that way if a vertex key exists? –  Aug 26 '14 at 11:39

1 Answers1

0

Here's a quick optimization for what seems to be the important function:

int get_index(const string& vertex)
{
    typedef unordered_map <string, int> map_t;
    pair<map_t::iterator, bool> inserted =
        indexes_of_vertices.insert(map_t::value_type(vertex, index_counter));
    if (inserted.second) // the key was missing until now
        return index_counter++;
    else // inserted.second is false, means vertex was already there
        return inserted.first->second; // this is the value
}

The optimizations are:

  1. Take argument by const-ref.
  2. Do a single map lookup instead of two: we speculatively insert() then see if it worked or not, which saves a redundant lookup in either case.

Please let us know how much difference that makes. Another idea, if your keys are usually small, is to use a self-contained string type like GCC's vstring which avoids ex-situ memory allocation for strings under one or two dozen characters. And then to consider whether your data are really large enough to benefit from a hash table, or if another data structure would be more efficient.

Community
  • 1
  • 1
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • These optimizations have changed like 2% of the speed, but probably the optimization about shorter strings would be significant, my strings are limited by 12 characters. But unfortunately I can't use vstring, because I can't provide additional files... Could I handle short strings in some other way? –  Aug 26 '14 at 12:01
  • Definitely. Use `char str[13]` and provide your own custom char* hashing function which for example could just return the first 8 bytes of the string cast to size_t (if there is good entropy at the beginning, otherwise use the middle etc.). How many elements are in the map eventually? – John Zwinck Aug 26 '14 at 12:02
  • I changed my variables from string to char x[13] and it's better. However I left all the arguments as strings (so it's converting now I believe), because I can't get it to work. It doesn't work properly when I am trying to change arguments from string to char *. Could you please modify the code from my post so that it works with char[]? I would be very grateful! And also I don't really know how to do that part with hash function :( PS: the map can contain max 10.000 elements –  Aug 26 '14 at 12:23
  • @Randolph: sorry but I won't straight-up rewrite your code, you can do this. :) You just need to write a custom hash function for char[13] (there are plenty of tutorials on this), and an equality comparison function (which can just use strcmp). You could probably get even better performance by storing a sorted vector of pairs if you don't need to add new values often during runtime (insert them all, sort once, then use the vector via binary search). That would avoid the dynamic memory stuff that's happening in unordered_map. – John Zwinck Aug 26 '14 at 12:40