I am counting the number of times every word occurs in a text file. I would like to avoid cases and hence am doing tolower to my input and then counting. I have a map data structure having string and int to keep count. Now, when I output the word and its count, I don't want the word to be in lower case, but want it to maintain its original case. So, for counting all the words should change to lowercase but while giving output they all should be in their original case. Is there anyway to achieve this with using only one map?
-
3What should happen if the same word occurs both in lower and upper case in the text. Do you want to record all occurences or just the first one or last one? – ChrisWue Nov 09 '11 at 06:48
-
2If you have two words that are the same except the casing, will you output it as two different words with one as count, or as one word with two counts? – Some programmer dude Nov 09 '11 at 06:54
-
I would output it as one word with two counts – user1035927 Nov 09 '11 at 06:56
-
Can you give an example of how you want the output to be in case you have the following as your input "hello HELLO Hello". – Roee Gavirel Nov 09 '11 at 06:57
-
In that case I don't really see a reason you want to keep the casing. Or do you want to output the word, the count and then the list of all words? Otherwise how would you select which of the different casings of the word you should print? – Some programmer dude Nov 09 '11 at 07:01
-
Let me give more elaborated example output: "hello HELLO Hello World" should give: hello 3 World 1 (note the case of World is maintained) – user1035927 Nov 09 '11 at 07:01
-
currently what is happening is: "hello HELLO Hello World" gives: hello 3 world 1 (note case changed, as I do tolower to all char) – user1035927 Nov 09 '11 at 07:29
5 Answers
The third template parameter of std::map
is a comparator type. You can provide your own comparison operation, in your case a case-insensitive one.
struct CaseInsensitive {
bool operator()(std::string const& left, std::string const& right) const {
size_t const size = std::min(left.size(), right.size());
for (size_t i = 0; i != size; ++i) {
char const lowerLeft = std::tolower(left[i]);
char const lowerRight = std::tolower(right[i]);
if (lowerLeft < lowerRight) { return true; }
if (lowerLeft > lowerRight) { return false; }
// if equal? continue!
}
// same prefix? then we compare the length
return left.size() < right.size();
}
};
Then instanciate your map:
typedef std::map<std::string, unsigned, CaseInsensitive> MyWordCountingMap;
Note: only the first spelling is preserved (which seems okay with you)

- 287,565
- 48
- 449
- 722
-
bool operator()(std::string const& left, std::string const& right) const Is this line doing operator overloading? – user1035927 Nov 09 '11 at 15:54
-
1@user1035927: Not really. It is declaring the operator `bool CaseInsensitive::operator()(...) const`, functors usually declare an `operator()` something, and in particular for maps, the functor need an operator that compares two keys and return a boolean. Generally speaking, operator overloading concerns the free-function at namespace level: `operator+`, `operator*`, `operator<`, ... – Matthieu M. Nov 09 '11 at 16:17
-
sorry to bother you, one more query, why do we have the operator declared as const? any specific reason for that – user1035927 Nov 09 '11 at 16:21
-
2@user1035927: The `operator` here is a regular method of `CaseInsensitive` (indeed you could invoke it as `CaseInsensitive ci; ci.operator(left, right);`. Since it does not modify the state of `CaseInsensitive`, it is declared `const` to document this fact. As there is no state, it does not matter much, but I am a const evangelist :) – Matthieu M. Nov 09 '11 at 16:31
This should work. For multiple cases the first case will be inside the map and not lower case. Also the solution uses only one map as you wanted
using namespace std;
struct StrCaseInsensitive
{
bool operator() (const string& left , const string& right )
{
return _stricmp( left.c_str() , right.c_str() ) < 0;
}
};
int main(void)
{
char* input[] = { "Foo" , "bar" , "Bar" , "FOO" };
std::map<string, int , StrCaseInsensitive> CountMap;
for( int i = 0 ; i < 4; ++i )
{
CountMap[ input[i] ] += 1;
}
return 0;
}

- 24,045
- 1
- 55
- 85
You can use map<string, vector<string> >
.
The key is the lowercase word. The value is the vector of all the given cases of this word.
(you can also use multimap<string, string>
which is basically the same, but I usually prefer a map of vectors)
map<string, vector<string> > m;
m.size(); // number of lowercase words
m["abc"].size(); // number of the given cases of the word "abc"

- 26,650
- 27
- 89
- 114
-
-
Because the number of words to put in the vector is not known, which may cause lots of reallocations of the vector as words are added. – Some programmer dude Nov 09 '11 at 07:14
-
3A list will require the same reallocations. A list is usually useful only when you have to add elements in the beginning or in the middle of the container, but this is not the case. – Igor Nov 09 '11 at 07:25
What do you want to happen with different case variants of the same word?
One possibility is to use std::multiset with a caseless comparator as its Compare
template parameter. In this case, all variants of each word will be preserved in the set. Number of occurrences of each word can be obtained via count() member function of the set.
You can use a structure or std::pair
to keep both the original case and a number of occurrences. Your type would then look like this: map < string, pair <string, int> >

- 9,425
- 13
- 50
- 81
-
If I decide to use map
, where int holds the count, and one string holds the lowercase and other the original case, will this be an elegant solution – user1035927 Nov 09 '11 at 07:04 -
`map` is declared like this: `template < class Key, class T, class Compare = less
, class Allocator = allocator – Septagram Nov 09 '11 at 10:04> > class map;`, it does have three template arguments, but the third is comparator, not value type. The `map` only represents key-value pairs, and if you want to supply some additional data, you will have to make value compound somehow. If I were you, however, I'd just create a second `map ` to hold original casing for every item. IMHO it adds less complexity than structures or pairs. -