Word count program in C++

Question

Possible Duplicate:
Count the number of times each word occurs in a file

Hi,
Its been a long time since I did my C++ programming.
This could be quite a dumb question.
I have found several programs on word count in this site.
But most of them are using std::string as their keys.
In my case, I needed to use char* as my key.
But it seems since each char* has different address values, the duplicate keys are getting inserted in my map.

char str[] = "This This";
typedef std::map<char*, int> word_count_t;
typedef word_count_t::iterator word_count_iter_t;

int _tmain(int argc, _TCHAR* argv[])
{
    char *token = strtok(str, " ");
    word_count_t word_count;
    word_count_iter_t itr = NULL;

    while(token) {
        ++word_count[token];
        token = strtok(NULL, " ");
    }

    for(itr = word_count.begin(); itr != word_count.end(); itr++) {
        std::cout << "Key: " << itr->first  << ", Value: " << itr->second << std::endl;
    }

    getchar();
}

The output I m getting for this program is

Key: This, Value: 1
Key: This, Value: 1

I wanted output like

Key: This, Value: 2

Can somebody tell me where did I miss what?

Thanks.

A Map cannot have duplicate keys, You should use multimap. – Alok Save May 26 '11 at 08:33 — Alok Save, May 26 '11 at 08:33
@Als He doesn't want duplicate keys. – May 26 '11 at 08:36 — , May 26 '11 at 08:36
@Nawaz Except that he isn't counting words from a file. – May 26 '11 at 08:38 — , May 26 '11 at 08:38

score 2 · Answer 1 · answered May 26 '11 at 08:32

2

You want a std::map<std::string, int> - your map of char* will be comparing pointers not the strings they point to.

answered May 26 '11 at 08:32

score 1 · Answer 2 · answered May 26 '11 at 08:37

std::map by default uses operator < on the key type for doing comparisons. operator < on char * compares pointer addresses, not the characters of the strings.

You want to use std::map<std::string, int> instead, as operator < on std::string does lexical string comparisons.

score 1 · Answer 3 · answered May 26 '11 at 08:37

You must make your own compare class for const char *:

struct StrCmp {
    static bool operator() (const char *a, const char *b) {
        return strcmp(a, b)<0;
    }
};

Then you can use a map:

typedef std::map<char const *, int, StrCmp> word_count_t;

score 0 · Answer 4 · answered May 26 '11 at 08:39

First of all, you should really use some HashMap implementation. std::map is a TreeMap and will be slower when counting words in a huge text. This is due to the fact, that huge amounts of word occurrences in the text will be mapped to a moderate amount of distinct words. Even if you require the output to be sorted, sorting a hashmap afterwards can be better, because inserting into the tree will be #occurrences * log #words while sorting them will be O(#words * log #words)

Apart from that, most hashmap implementations (personally i'm usually using google::dense_hash_map or google::sparse_hash_map) can deal with char* without you having to write a hash function (they use an stl hash function). So it's basically plug & play for you.

score 0 · Answer 5 · answered May 26 '11 at 08:40

0

Keys are different because they are char*, that is, pointers pointing to a different position in memory. If you used an std::map<std::string, int> and you did a ++word_count[std::string(token)] you would get the expected output.

answered May 26 '11 at 08:40

rturrado

7,699
6
42
62

Word count program in C++

5 Answers5

Linked