5

In an pre-interview, I am faced with a question like this:

Given a string consists of words separated by a single white space, print out the words in descending order sorted by the number of times they appear in the string.

For example an input string of “a b b” would generate the following output:

b : 2
a : 1

Firstly, I'd say it is not so clear that whether the input string is made up of single-letter words or multiple-letter words. If the former is the case, it could be simple.

Here is my thought:

int c[26] = {0};
char *pIn = strIn;

while (*pIn != 0 && *pIn != ' ')
{
    ++c[*pIn];
    ++pIn;
}

/* how to sort the array c[26] and remember the original index? */

I can get the statistics of the frequecy of every single-letter word in the input string, and I can get it sorted (using QuickSort or whatever). But after the count array is sorted, how to get the single-letter word associated with the count so that I can print them out in pair later?

If the input string is made of of multiple-letter word, I plan to use a map<const char *, int> to track the frequency. But again, how to sort the map's key-value pair?

The question is in C or C++, and any suggestion is welcome.

Thanks!

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Qiang Xu
  • 4,353
  • 8
  • 36
  • 45
  • 2
    In a pre-interview, I wouldn't use code like that. And I wouldn't ask it on SO. – Tom van der Woerdt Dec 30 '11 at 15:50
  • 1
    @Qiang Xu: your coding style is very C-ish, but not in a good way. You should prefer longer more meaningful variable names, and I personally would drop hungarian notation all together. – Evan Teran Dec 30 '11 at 15:55
  • 9
    How about something like the following: http://code.google.com/p/strtk/source/browse/trunk/strtk_wordfreq.cpp – Jared Krumsie Dec 30 '11 at 20:25

4 Answers4

2

I would use a std::map<std::string, int> to store the words and their counts. Then I would use something this to get the words:

while(std::cin >> word) {
    // increment map's count for that word
}

finally, you just need to figure out how to print them in order of frequency, I'll leave that as an exercise for you.

Evan Teran
  • 87,561
  • 32
  • 179
  • 238
  • Thanks, Evan. I also feel a map is what I need here. But I am stuck with how to print the words out in order of frequency. A map's key is always ordered, but not its value. :-( – Qiang Xu Dec 30 '11 at 16:33
  • 1
    @QianqXu: what have you tried? Have you considered even what appears to be an inefficient solution? Work out how to make it work, **then** work out how to make it efficient. One thing that comes to mind is perhaps a second `std::map` (note key/value swap) which you can create based on the first `std::map<>`... – Evan Teran Dec 30 '11 at 16:54
  • 1
    scratch that, the second map solution wouldn't work (you'll have duplicate keys), but a `std::vector>` seems reasonable. – Evan Teran Dec 30 '11 at 17:01
  • Compiling seems to pass, but a lot of linking error: http://pastebin.com/QjnMK2sY – Qiang Xu Dec 30 '11 at 18:30
  • 1
    compile c++ code with `g++`, not `gcc`. `g++` will properly link in the c++ standard library... – Evan Teran Dec 30 '11 at 19:08
  • It works!!! Thanks a lot, Evan. Just another question: I am feeling a bit leery regarding the way I convert the string to a stringstream object, and use this stringstream to read each space separated word into the map. Is there a more elegant way to do it? – Qiang Xu Dec 30 '11 at 19:15
  • I would recommend that you post a separate question for that. – Evan Teran Dec 30 '11 at 19:17
  • Good suggestion, Evan. Here is the link: http://stackoverflow.com/questions/8683302/how-to-read-into-a-c-string-like-from-a-stream – Qiang Xu Dec 30 '11 at 19:58
1

All the answers prior to mine did not give really an answer.

Let us think on a potential solution.

There is a more or less standard approach for counting something in a container.

We can use an associative container like a std::map or a std::unordered_map. And here we associate a "key", in this case the word, to a count, with a value, in this case the count of the specific word.

And luckily the maps have a very nice index operator[]. This will look for the given key and, if found, return a reference to the value. If not found, then it will create a new entry with the key and return a reference to the new entry. So, in both cases, we will get a reference to the value used for counting. And then we can simply write:

std::unordered_map<char,int> counter{};
counter[word]++;

And that looks really intuitive.

After this operation, you have already the frequency table. Either sorted by the key (the word), by using a std::map or unsorted, but faster accessible with a std::unordered_map.

Now you want to sort according to the frequency/count. Unfortunately this is not possible with maps.

Therefore we need to use a second container, like a ```std::vector`````which we then can sort unsing std::sort for any given predicate, or, we can copy the values into a container, like a std::multiset that implicitely orders its elements.

For getting out the words of a std::string we simply use a std::istringstream and the standard extraction operator >>. No big deal at all.

And because writing all this long names for the std containers, we create alias names, with the using keyword.

After all this, we now write ultra compact code and fulfill the task with just a few lines of code:

#include <iostream>
#include <string>
#include <sstream>
#include <utility>
#include <set>
#include <unordered_map>
#include <type_traits>
#include <iomanip>

// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;

// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;

// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Rank = std::multiset<Pair, Comp>;
// ------------------------------------------------------------

std::istringstream text{ " 4444 55555 1 22 4444 333 55555 333 333  4444  4444 55555  55555 55555 22 "};

int main() {

    Counter counter;
    
    // Count
    for (std::string word{}; text >> word; counter[word]++);

    // Sort
    Rank rank(counter.begin(), counter.end());

    // Output
    for (const auto& [word, count] : rank) std::cout << std::setw(15) << word << " : " << count << '\n';
}
A M
  • 14,694
  • 5
  • 19
  • 44
1

You're definitely wrong in assuming that you need only 26 options, 'cause your employer will want to allow multiple-character words as well (and maybe even numbers?).

This means you're going to need an array with a variable length. I strongly recommend using a vector or, even better, a map.

To find the character sequences in the string, find your current position (start at 0) and the position of the next space. Then that's the word. Set the current position to the space and do it again. Keep repeating this until you're at the end.

By using the map you'll already have the word/count available.

If the job you're applying for requires university skills, I strongly recommend optimizing the map by adding some kind of hashing function. However, judging by the difficulty of the question I assume that that is not the case.

Tom van der Woerdt
  • 29,532
  • 7
  • 72
  • 105
  • Thanks, Tom! With a map, the counting is almost trivial. But it seems the hardest part with a map is to print out its key-value pair in its value's decending order. Racking my brain... – Qiang Xu Dec 30 '11 at 16:35
  • Ah, well, that's the part where you may have to implement your own sorting algorithm or something. Can be done in ~15 lines. – Tom van der Woerdt Dec 30 '11 at 16:38
1

Taking the C-language case:

I like brute-force, straightforward algos so I would do it in this way:

  1. Tokenize the input string to give an unsorted array of words. I'll have to actually, physically move each word (because each is of variable length); and I think I'll need an array of char*, which I'll use as the arg to qsort( ).

  2. qsort( ) (descending) that array of words. (In the COMPAR function of qsort(), pretend that bigger words are smaller words so that the array acquires descending sort order.)

3.a. Go through the now-sorted array, looking for subarrays of identical words. The end of a subarray, and the beginning of the next, is signalled by the first non-identical word I see. 3.b. When I get to the end of a subarray (or to the end of the sorted array), I know (1) the word and (2) the number of identical words in the subarray.

EDIT new step 4: Save, in another array (call it array2), a char* to a word in the subarry and the count of identical words in the subarray.

  1. When no more words in sorted array, I'm done. it's time to print.

  2. qsort( ) array2 by word frequency.

  3. go through array2, printing each word and its frequency.

I'M DONE! Let's go to lunch.

Pete Wilson
  • 8,610
  • 6
  • 39
  • 51