2

first of all this is my first question on the site. I've done a lot of research and I don't think I found quite a specific issue as this, but if I'm wrong, feel free to correct me in an answer and link said topic to me.

Onto the problem itself, the assignment consists of a console application that will display each distinct word entered to it, as well as the number of occurrences for each unique word. I decided that the way to solve this would be through the use of a vector<string> and then use a nested loop structure where the outer loop would represent each unique word, and where the inner loop would be used to compare the word from the outer loop to each existing word in the vector.

However. I've ran across a problem.

With this basic setup:

//Sort vector into alphabetical order
sort(words.begin(), words.end()); //this only sorts them alphabetically, but equal strings are "next" to each other

//Find unique values
for(string::size_type i=0; i != words.size(); i++) {
    int count = 0;
    for(string::size_type j=0; j != words.size(); j++) {
        if(words[i] == words[j]){
            count++;
        }
    }
    cout << words[i] << " appeared: " << count << " times." << endl;
}

Everything works fine as far as functionality is concerned; 2+ instances of a word are correctly spotted but they are displayed 2+ times as their own rows, because the instance repeats itself whenever that duplicate element is encountered on the outer loop.

Here's a picture: Basic Result Promblem, Duplicate Output

I thought I'd solve it with the following code:

//Sort vector into alphabetical order
sort(words.begin(), words.end()); //this only sorts them alphabetically, but equal strings are "next" to each other

//Find unique values
for(string::size_type i=0; i != words.size(); i++) {
    int count = 0;
    for(string::size_type j=0; j != words.size(); j++) {
        if(words[i] == words[j]){
            count++;
            if(i != j){ //replacement: delete duplicate values from the vector (aka if the indexes don't match)
                words.erase(words.begin() + j); //delete element at index "j"
            }
        }
    }
    cout << words[i] << " appeared: " << count << " times." << endl;
}

A new problem arises with this: the word that appears 2+ times now throws an error. The index itself would work fine, i.e if I were to add cout << words[i] << endl; right after deleting the element it displays the correct word. However, the word that appears 2+ times does not show up at all, and returns an error.

Here's a picture: Updated problem, now duplicate values throw an error

Anyone would be nice enough to explain why this happens, and how to fix it?

  • 2
    Consider using instead: `std::unordered_map my_map;`. You can then add strings like so: `++my_map[my_string];`. The strings will be key, and the value will be the number of occurrences for said string key. – user2296177 May 19 '17 at 02:27
  • You erase element `j`, then immediately increment `j`, skipping the element that replaced the erased one and potentially accessing past the end of your vector (since use an equality comparison in your loop). Bad. – 1201ProgramAlarm May 19 '17 at 02:36
  • 1
    as @user2296177 pointed out, a hash map is an excellent data-structure to solve this problem. If you were to find out the count of some random string, you'll have to parse the entire vector (or up until that string occurs in the vector) which will be O(n), but it would be an O(1) operation in a hash map. – Vada Poché May 19 '17 at 04:02
  • Thanks for the input guys! VadaPoché and user2296177, I'll make sure to look that further on as I get more into the language, and 1201ProgramAlarm, you are right, I felt that was the problem as well, but I was unable to come up with a solution. Slightly embarrassed that it was just the loop condition but oh well. Thanks! – Matias Lago May 19 '17 at 13:09

5 Answers5

1

Your code is correct, you just need to check for < on the loops instead of !=.

Because reducing the size of the vector within the loop can cause an invalid index, which is beyond the size of the vector to occur, but the loop may still progress with != while < will always consider only valid indices.

Change only != to < in the loops and it works.

Here is the Output.

Edit:

You also need to reset j to check for next element at the same position from where you erase an element, because now the next element is in that position instead of j + 1.

Just add j--; after erasing the element and it works.

Here is the new Output.

Corrected code:

//Sort vector into alphabetical order
sort(words.begin(), words.end()); //this only sorts them alphabetically, but equal strings are "next" to each other

//Find unique values
for(string::size_type i=0; i < words.size(); i++) {
    int count = 0;
    for(string::size_type j=0; j < words.size(); j++) {
        if(words[i] == words[j]){
            count++;
            if(i != j){ //replacement: delete duplicate values from the vector (aka if the indexes don't match)
                words.erase(words.begin() + j); //delete element at index "j"
                j--; // Re-run iteration for j
            }
        }
    }
    cout << words[i] << " appeared: " << count << " times." << endl;
}
Ahmed Akhtar
  • 1,444
  • 1
  • 16
  • 28
1

Let's look at where your example case fails:

for(string::size_type j=0; j != words.size(); j++) { // i: 1, j: 2, size(words): 3
    if(words[i] == words[j]){ // words[i] matches words[j]
        count++;
        if(i != j){ // i doesn't match j
            words.erase(words.begin() + j); // i: 1, j: 2, size(words): 2
        }
    }
} // Upon rexecuting the iteration expression i: 1, j: 3, size(words): 2 thus `j` will be greater than `size(words)` and will be used to continue the loop even though it is an invalid index

There have been several solutions presented to solve this problem using your current code. But I would suggest that the simplest method for solving this would be the multiset:

const multiset<string> words{istream_iterator<string>(cin), istream_iterator<string>()};
auto it = cbegin(words);

while(it != cend(words)) {
    auto i = words.upper_bound(*it);

    cout << *it << " appeared: " << distance(it, i) << " times\n";
    it = i;
}

You can see a live example of this here: http://ideone.com/Nhicos Note that this code does away with the need for an input sequence termination word, "-end" in your case, and instead depends upon EOF. which is automatically appended to the http://ideone.com input: Read cin till EOF

Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • 1
    Thanks for the detailed explanation, but I am approaching these sets of problems as they appear based on the textbook that I have purchased, "Accelerated C++" which sadly doesn't offer "prototype" answers for the problems. I had a feeling that the size being modified was the issue, and thankfully changing `!=` to `<` on the loop condition was good enough. This seems a bit more advanced than my existing knowledge, but I'll make sure to look over it once I feel that I reach a subject like this on the text/later on. Thanks again! – Matias Lago May 19 '17 at 13:06
  • Update: the simple change of comparison was not enough for cases where the words would be repeated 2+ times, i.e. "test" 3 times would be reported as "test appeared 2 times" and then "test appeared 1 time", but I suspect that this is because of the "destructive" approach from the `erase` method once again. After testing your method, I must admit that although it is beyond my current knowledge, it is the more accurate solution to the problem hence why I changed my ticked answer to this one. Thanks! – Matias Lago May 19 '17 at 13:43
  • @M.Lago See the edit I have made to my answer, it now solves the problem for " cases where the words would be repeated 2+ times". – Ahmed Akhtar May 20 '17 at 01:09
  • @AhmedAkhtar good eye, that will do the trick as well. I came up with the solution of using a duplicate vector, but I assume this way is more efficient and therefore a cleaner/faster solution. Thanks for sharing! – Matias Lago May 20 '17 at 15:20
0

i think you shoud check if i!=j;if i==j it compare to itself.

//Find unique values
for(string::size_type i=0; i != words.size(); i++) {
int count = 0;
for(string::size_type j=0; j != words.size(); j++) {
    if(words[i] == words[j]&&i!=j){
        count++;
    }
}
cout << words[i] << " appeared: " << count << " times." << endl;
}
Li Kui
  • 640
  • 9
  • 21
  • Thanks for the help, but as noted above the simpler fix was to correct the loop condition to use a `<` or less-than comparison rather than an equality one. – Matias Lago May 19 '17 at 13:13
0

This problem can be easily solved using a data structure called a hash table. The hash table is an associative array which holds a key-value pair. Basically the "key", which can be a word, is used to calculate the index of the array where the "value" is held, which in your instance can be the number of times it's been counted. C++ has the

std::unordered_map

which is a hash table. Take a look here for the theory behind hash tables: https://en.wikipedia.org/wiki/Hash_table and look here for the C++ version: http://www.cplusplus.com/reference/unordered_map/unordered_map/ This should make your program much easier to write. You can just add words to your hash table when they are entered with a value of 1. When you see the word again, increment it's associated value.

Jayson Boubin
  • 1,504
  • 2
  • 11
  • 19
  • As I've noted to another user above, I'm following my text "Accelerated C++". I will keep this bookmarked as well for when I feel like I've had enough practice with the other elements, but thank you for both the references and a detailed explanation. – Matias Lago May 19 '17 at 13:08
0

Update:

The simple change of operator != to < in the loop condition was not enough. Yes, two cases worked fine, but if there were 3+ instances of a specific word, then the output would be split into several lines. The explanation I can offer with my so far limited knowledge is that the inner loop was checking for the condition "is the index from the outer loop equal to the index from the inner loop", which should in theory work right. However, since in 2+ instance cases at least 1 element from the array is removed, the condition would be evaluated separately, instead of all together.

After some reasoning, I was able to come up with this for a final solution:

//Sort vector into alphabetical order
sort(words.begin(), words.end()); //this only sorts them alphabetically, but equal strings are "next" to each other

//Find unique values
for(string::size_type i=0; i < words.size(); i++) {
    int count = 0;
    //duplicate vector, and use it for the inner loop
    vector<string> duplicate = words;
    for(string::size_type j=0; j < duplicate.size(); j++) {
        if(words[i] == words[j]){
            count++;
            if(i != j){ //replacement: delete duplicate values from the vector (aka if the indexes don't match)
                words.erase(words.begin() + j); //delete element at index "j"
            }
        }
    }
    cout << words[i] << " appeared: " << count << " times." << endl;
}

This does in fact work with any sort of instance cases, be it 2, 3, 5, etc.

I wanted to solve the problem this way (with vectors themselves) because the textbook "Accelerated C++" had only covered vectors and strings up to that point.

Please keep the following points in mind:

  • Being a novice programmer, further optimization will most likely be a choice
  • If you are interested in what looks like the most accurate/simple/efficient answer so far, please take a look at @Jonathan Mee's answer, it should still be voted as the correct answer.

Thanks for the help to all who posted here!