0

Possible Duplicate:
Count the number of times each word occurs in a file

Hi,
Its been a long time since I did my C++ programming.
This could be quite a dumb question.
I have found several programs on word count in this site.
But most of them are using std::string as their keys.
In my case, I needed to use char* as my key.
But it seems since each char* has different address values, the duplicate keys are getting inserted in my map.

char str[] = "This This";
typedef std::map<char*, int> word_count_t;
typedef word_count_t::iterator word_count_iter_t;

int _tmain(int argc, _TCHAR* argv[])
{
    char *token = strtok(str, " ");
    word_count_t word_count;
    word_count_iter_t itr = NULL;

    while(token) {
        ++word_count[token];
        token = strtok(NULL, " ");
    }

    for(itr = word_count.begin(); itr != word_count.end(); itr++) {
        std::cout << "Key: " << itr->first  << ", Value: " << itr->second << std::endl;
    }

    getchar();
}

The output I m getting for this program is

Key: This, Value: 1
Key: This, Value: 1

I wanted output like

Key: This, Value: 2

Can somebody tell me where did I miss what?

Thanks.

Community
  • 1
  • 1
shiva
  • 1
  • 1
  • 1

5 Answers5

2

You want a std::map<std::string, int> - your map of char* will be comparing pointers not the strings they point to.

1

std::map by default uses operator < on the key type for doing comparisons. operator < on char * compares pointer addresses, not the characters of the strings.

You want to use std::map<std::string, int> instead, as operator < on std::string does lexical string comparisons.

hammar
  • 138,522
  • 17
  • 304
  • 385
1

You must make your own compare class for const char *:

struct StrCmp {
    static bool operator() (const char *a, const char *b) {
        return strcmp(a, b)<0;
    }
};

Then you can use a map:

typedef std::map<char const *, int, StrCmp> word_count_t;
Juraj Blaho
  • 13,301
  • 7
  • 50
  • 96
0

First of all, you should really use some HashMap implementation. std::map is a TreeMap and will be slower when counting words in a huge text. This is due to the fact, that huge amounts of word occurrences in the text will be mapped to a moderate amount of distinct words. Even if you require the output to be sorted, sorting a hashmap afterwards can be better, because inserting into the tree will be #occurrences * log #words while sorting them will be O(#words * log #words)

Apart from that, most hashmap implementations (personally i'm usually using google::dense_hash_map or google::sparse_hash_map) can deal with char* without you having to write a hash function (they use an stl hash function). So it's basically plug & play for you.

b.buchhold
  • 3,837
  • 2
  • 24
  • 33
0

Keys are different because they are char*, that is, pointers pointing to a different position in memory. If you used an std::map<std::string, int> and you did a ++word_count[std::string(token)] you would get the expected output.

rturrado
  • 7,699
  • 6
  • 42
  • 62