12

I feel exhausted when trying to use the container unordered_map with char* as the key (on Windows, I am using VS 2010). I know that I have to define my own compare function for char*, which inherits from binary_function. The following is a sample program.

#include<unordered_map>
#include <iostream>
#include <string>
using namespace std;

template <class _Tp>  
struct my_equal_to : public binary_function<_Tp, _Tp, bool>  
{  
    bool operator()(const _Tp& __x, const _Tp& __y) const  
    { return strcmp( __x, __y ) == 0; }  
};

typedef unordered_map<char*, unsigned int, ::std::tr1::hash<char*>,  my_equal_to<char*> > my_unordered_map;
//typedef unordered_map<string, unsigned int > my_unordered_map;

my_unordered_map location_map;

int main(){
    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));
    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    printf("map size: %d\n", location_map.size());
    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end())
    {
        printf("found!\n");
    }

    return 0;
} 

I insert the same C string abc twice and look it up. The second insertion should fail and there will be only one abc in the unordered_map. However, the output size is 3. It seems that the compare function does not work properly here.

Moreover, I get another strange result about the find function, by running the program for many times, the finding result even changes! Sometimes the string abc is found, while the other times abc is not found!

Could anyone help me on this? Your help is very much appreciated!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Edit: After defining a hash function for char* by my own, the program works properly. The full program code is listed below. Thank you all.

#include<unordered_map>
#include <iostream>
using namespace std;

template <class _Tp>  
struct my_equal_to : public binary_function<_Tp, _Tp, bool>  
{  
    bool operator()(const _Tp& __x, const _Tp& __y) const  
    { return strcmp( __x, __y ) == 0; }  
};


struct Hash_Func{
    //BKDR hash algorithm
    int operator()(char * str)const
    {
        int seed = 131;//31  131 1313 13131131313 etc//
        int hash = 0;
        while(*str)
        {
            hash = (hash * seed) + (*str);
            str ++;
        }

        return hash & (0x7FFFFFFF);
    }
};

typedef unordered_map<char*, unsigned int, Hash_Func,  my_equal_to<char*> > my_unordered_map;


int main(){
    my_unordered_map location_map;

    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));
    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    printf("map size: %d\n", location_map.size());
    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end())
    {
        printf("found!\n");
    }

    return 0;
}

Note: Using char* as the key type for an unordered_map or other STL containers may be dangerous, a safe way (seems to be the only way) is: in the main function, new or malloc a block (e.g. an array of c strings) on heap and fill it with c strings. Insert these c strings into unordered_map. The allocated block of memory is freed at the end of of main function (by delete or free).

Bloodmoon
  • 1,308
  • 1
  • 19
  • 34
  • 1
    You do not need to inherit from `binary_function`. It might even be deprecated; I can't look it up just now. – Potatoswatter Dec 18 '13 at 11:28
  • This isn't the problem, but names that contain two consecutive underscores (`__x`, `__y`) and names that begin with an underscore followed by a capital letter (`_Tp`) are reserved to the implementation (the compiler and its library). Don't use them. – Pete Becker Dec 18 '13 at 14:39
  • Do you mean the the key string pointed `char*` could be changed? Because What I think is that you want key to be constant string pointed by `const char *`. Make every occurrence of `char *` to `const char *` to support this – rahul.deshmukhpatil Sep 18 '17 at 10:43
  • @rahul.deshmukhpatil There are deeper questions, pls take a look at the answers and the comments below them. – Bloodmoon Sep 20 '17 at 16:29
  • just a note that you'll want to `#include ` for `binary_function`. – MikeB Nov 05 '19 at 22:46
  • @MikeB why it works even if I didn't include ? The compiler includes it for me? – Bloodmoon Nov 06 '19 at 05:25
  • @Bloodmoon The compiler doesn't include anything for you; it only does what you ask of it. Maybe you're lucky -- maybe some other header caused it to be included. But you're much better off explicitly including the things that you use yourself. Because: what if the implementation changes and you're one day not so lucky? – MikeB Nov 07 '19 at 18:40

4 Answers4

3

You comparator is fine (although passing a nullptr is undefined and probably should be handled)

The hash, ::std::tr1::hash<char*> is hashing off pointers so each "abc" goes (usually) in a different bucket

You need to write your own hash function that guarantees that hash("abc") always gives the same answer

For now - performance will be terrible, but have a hash that returns 0 - and you should see the second "abc" match the first

As per comments - using std::string simplifies memory management and provides a library supported hash and comparator, so just std::unordered_map<std::string, X> will work. This also means that upon deletion of the unordered map all strings will be deallocated for you. You can even instantiate the std::strings from char arrays on the stack safely.

If you still want to use char * then you will still need your own comparator and hash, but you can use std::shared_ptr to manage the memory for you (do not use stack instances - do a new char[]) you will then have a std::unordered_map<shared_ptr<char *>, X> but have no complications later from memory leaks.

If you still want to use char * you are on the right track, but it is important that you use a memory leak tool like purify or valgrind to make sure that you truly have all the memory management under control. (This is generally a good idea for any project)

Finally, global variables should be avoided.

lynxoid
  • 509
  • 6
  • 14
Glenn Teitelbaum
  • 10,108
  • 3
  • 36
  • 80
  • 3
    using a pointer as an STL map key is a pretty risky choice that should probably be discouraged rather than patched up. Note that he uses a global map populated with pointers to automatic char arrays. This approach will almost certainly blow up in his face. – Jeff Dec 18 '13 at 06:46
  • 4
    @Jeff OP's question is how to use `char *` in `std::unordered_map` not how to avoid using `char *` by using string, as mentioned "The following is a sample program" - my answer addresses the question - your comment is a valid concern - but not a solution. – Glenn Teitelbaum Dec 18 '13 at 07:39
  • @GlennTeitelbaum Will a global `unordered_map` variable here be a risk? – Bloodmoon Dec 18 '13 at 08:02
  • @Glenn I strongly disagree. It is pretty clear that the question itself implies a misconception about object lifetimes and ownership, especially in regard to STL containers. I ask you to consider the following: in real usage, how are the c-strings being allocated, and how can they be freed without a length stored along with the pointer? If you think that he is declaring a fixed number of literal-initialized values on the stack (or doing explicit stack allocation), why would he need to make a map and pass it to a subroutine? – Jeff Dec 18 '13 at 08:37
  • "and how can they be freed without a length stored along with the pointer" you don't need the length to free them, and the example is fine as they are actually allocated on the stack but stay in scope during usage. Everything doesn't have to be std::string to work. – Glenn Teitelbaum Dec 18 '13 at 09:33
  • @Glenn It doesn't absolutely have to be `std::string` to work, but your answer really needs to stress that the whole idea is a hack that has usage constraints well beyond what STL container users expect. Even his trivial example can fail if a compiler ever tries to reclaim their space after insertion. His statement that "Sometimes the string abc is found, while the other times abc is not found!" should be a strong suggestion that this has already been happening. – Jeff Dec 18 '13 at 16:07
  • @Bloodmoon global variables should be avoioded. In addition, while the example uses the `unordered map` in the same scope as the `char *` all entries in the map should be explicitely allocated (not just char[]) – Glenn Teitelbaum Dec 18 '13 at 16:57
  • thanks for the edit, it brings me to a higher level when thinking about this problem. – Bloodmoon Dec 19 '13 at 01:25
0

Using a char pointer as a key like you are above is almost certainly not what you want to do.

STL containers deal with stored values, in the case of std::unordered_map<char *, unsigned int, ...>, you are dealing with pointers to c strings, which may not even be around on subsequent insertion/removal checks.

Note that your my_unordered_map is a global variable but you are trying to insert local char arrays a, b, and c. What do you expect your comparison function my_equal_to() to strcmp() when the inserted c strings fall out of scope? (You suddenly have keys pointing to random garbage that can be compared to newly inserted future values.)

It is important that STL map keys be copyable values that cannot have their meanings changed by external program behavior. You should almost certainly use std::string or similar for your key values, even if their construction seems wasteful to you at first glance.

The following will work exactly as you intend things to work above, and is vastly safer:

#include <unordered_map>
#include <iostream>
#include <string>

using namespace std;

// STL containers use copy semantics, so don't use pointers for keys!!
typedef unordered_map<std::string, unsigned int> my_unordered_map;

my_unordered_map location_map;

int main() {
    char a[10] = "ab";
    location_map.insert(my_unordered_map::value_type(a, 10));

    char b[10] = "abc";
    location_map.insert(my_unordered_map::value_type(b, 20));

    char c[10] = "abc";
    location_map.insert(my_unordered_map::value_type(c, 20));

    cout << "map size: " << location_map.size() << endl;

    my_unordered_map::iterator it;
    if ((it = location_map.find("abc")) != location_map.end()) {
        cout << "found \"" << it->first << "\": " << it->second << endl;
    }

    return 0;
}
Jeff
  • 3,475
  • 4
  • 26
  • 35
  • Thanks, but the reason why I want to use `char*` rather than `std::string` is the performance of the program. An anonymous object of `std::string` will be created each time I insert a character string. And I do not understand why a global `unordered_map` variable will be a problem. Actually the pointers passed to `my_equal_to()` are all virtual addresses in this program, and I do not understand how the inserted c strings fall out of scope. Could you please elaborate on this? – Bloodmoon Dec 18 '13 at 08:01
  • @Bloodmoon What you have in your fix works but will fail completely in real-use scenarios. The char a[10], b[10], and c[10] arrays are created on the stack and effectively go away once they fall out of scope. To illustrate this, make a void foo() and an int bar(), where foo() does the a/b/c definitions and map inserts. In bar(), make an int x[100] and populate it with values 1 to 100, then add them and return the value. In main(), call foo(), then bar() before looking for the values. It will fail, because you overwrote the same memory values that used to contain the char arrays... – Jeff Dec 18 '13 at 08:27
  • @Bloodmoon To summarize my longer comment above, your problem is not that the memory holding the character arrays will get unmapped, but that the stack address space holding the characters will be overwritten eventually. The only reason your example worked is that you defined a/b/c in the same function as EVERY map insertion. In fact, you could probably omit what I have about the bar() function, and the first "map size: " printing call will probably nuke the char arrays set in the foo() call... – Jeff Dec 18 '13 at 08:45
  • I got your point now. What if I use unordered_map with char* under strict constraints: in the `main` function, `new` a block (e.g. an array of c strings) on heap and fill it with c strings. Insert these c strings into unordered_map. These insertions may be mixed up by some lookups. The allocated block of memory is freed at the end of of `main` function (be `delete`). Will this usage be safe? – Bloodmoon Dec 18 '13 at 14:44
  • @Bloodmoon Yes, this usage would be safe and may actually be the only safe way to go about what you're trying to achieve. The important thing is just that the collection key pointers can never outlive data they point to. You will have fragmentation of your block if you need to delete things. To clarify, though, you have to `new`/`malloc` a block of characters (char block[xxx], -not- char *foo[xx]) into which you copy the c string characters. – Jeff Dec 18 '13 at 15:51
  • Thanks for your replies that point out a risk which I am not aware of when using unordered_map with char*. I think this risk also exist for other STL containers with char* as the key type. I have voted for your answer, but since it is not the direct solution to my question, I just cannot accept it as `the Answer`. – Bloodmoon Dec 18 '13 at 16:13
  • Don't worry about accepting my answer. Just please fix your posted solution and prominently state there the design considerations it requires so that nobody else tries using it in a less rigid (i.e., dangerous) manner unknowingly. Even the stack frame-local insertion method is at the compiler's mercy about keeping a/b/c alive until the find call, unless you reference all their values individually after the find. – Jeff Dec 18 '13 at 16:22
  • 2
    This is the kind of spirit why C++ will soon be as slow as Java. You get more memory, more indirection etc. It really feels bad to me. – Lothar Oct 23 '15 at 16:09
0

(Answer for modern C++, for people still stumbling upon this question)

These days, if you use C++17 or above, you can use std::string_view as a key in an unordered_map.

std::string_view only keeps a reference to the raw char* data instead of copying it, allowing you to avoid a copy when you're sure the raw char* data outlives the unordered_map.

However, unlike char*, std::string_view implements various methods and operators, like std::hash, making it useful in many more places.

std::unordered_map<std::string_view, unsigned int> my_map;
my_map["some literal"] = 123;
printf("%d\n", my_map["some literal"]);

In the above code, I only put string literals in the map, which is safe. Be careful when putting other things in a map with string_view keys - it's your responsibility to ensure they don't get destroyed before the map!

secondperson
  • 148
  • 1
  • 4
-3

When you define something such as "abc" it get assigned a const char*. Every time that you write "abc" within your program there is going to be a new memory alocated. So:

const char* x = "abc";
const char* y = "abc";
return x==y;

Will always return false because new memory is alocated each time "abc" is wrriten (sorry if I sound a bit repetitive).

Lucas S
  • 73
  • 2
  • 10