Given an Array of strings how do I Remove Duplicates?

Question

I would like to know how to remove duplicate strings from a container, but ignore word differences from trailing punctuation.

For example given these strings:

Why do do we we here here?

I would like to get this output:

Why do we here?

[Tokenize string.](https://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c) — Mahesh, Aug 11 '17 at 14:05
Possible duplicate of [Most elegant way to split a string?](https://stackoverflow.com/questions/236129/most-elegant-way-to-split-a-string) — Leonardo Alves Machado, Aug 11 '17 at 14:06
Do you know stream input (`cin >> x;`)? Do you know how to enlarge an array? What do you mean *compare*, do you mean test for equality? — Beta, Aug 11 '17 at 14:13
@beta the question is how to remove duplicates from the string Example: why you here here? answer: why you here? so i want to remove "here" from the sentence but when i compare them they are different because of "?". — Shubham, Aug 11 '17 at 14:17
Documentation has a great article on tokenization: https://stackoverflow.com/documentation/c%2b%2b/488/stdstring/2148/tokenize By a great author ;) Perhaps looking over that would be helpful. You may be able to solve your problem on your own after reading that. If not you really need to edit the question to clarify. Are you: 1) Asking how to tokenize a string? 2) Asking how to compare strings? 3) Asking how to chop punctuation from words? 4) Asking how to remove duplicate strings from a container? Note that you should have said yes to only 1 of these or your question is too broad. — Jonathan Mee, Aug 11 '17 at 14:23
@Shubham have you clicked on the link? There you can find several ways to split a string in c++. Use the one you like the most — Leonardo Alves Machado, Aug 11 '17 at 14:26
@JonathanMee 4 how to remove duplicates from the sentence and the twist is in the last word. — Shubham, Aug 11 '17 at 14:27
@Shubham So you're really asking 3 *and* 4. Still probably too much for a question but, at least edit it so it's clear that you're not asking how to tokenize a string. — Jonathan Mee, Aug 11 '17 at 14:32
Use `std::set` to contain your words. The `std::set` doesn't allow duplicates. — Thomas Matthews, Aug 11 '17 at 15:21

score 0 · Accepted Answer · answered Aug 11 '17 at 15:31

The algorithm:

While Reading a word is successful, do:
If End of file, quit.
If word list is empty, push back word.
else begin
Search word list for the word.
if word doesn't exist, push back the word.
end else (step 4)
end (while reading a word)

Use std::string for your word. This allows you to do the following:

std::string word;
while (data_file >> word)
{
}

Use std::vector to contain your words (although you could use std::list as well). The std::vector grows dynamically so you don't have to worry about reallocation if you picked the wrong size.
To append to std::vector, use the push_back method.

To compare std::string, use operator==:

std::string new_word;
std::vector<std::string> word_list;
//...
if (word_list[index] == new_word)
{
  continue;
}

Jonathan Mee · Answer 2 · 2017-08-11T16:19:22.577

So you have said you know how to tokenize a string. (If you don't spend some time here: https://stackoverflow.com/a/38595708/2642059) So I'm going to assume that we're given a vector<string> foo which contains words with possibly trailing punctuation.

for(auto it = cbegin(foo); it != cend(foo); ++it) {
    if(none_of(next(it), cend(foo), [&](const auto& i) {
                                                         const auto finish = mismatch(cbegin(*it), cend(*it), cbegin(i), cend(i));
                                                         return (finish.first == cend(*it) || !isalnum(*finish.first)) && (finish.second == cend(i) || !isalnum(*finish.second));
                                                        })) {
        cout << *it << ' ';
    }
}

Live Example

It's worth noting here that you haven't given us rules on how to handle words like: "down", "down-vote", and "downvote" This algorithm presumes that the 1^st 2 are equal. You also haven't given us rules for how to handle: "Why do, do we we here, here?" This algorithm always returns the final repetition, so the output would be "Why do we here?"

If the presumptions made by this algorithm are not totally to your liking leave me a comment and we'll work on getting you comfortable with this code to where you can make the adjustments that you need.

I am just a beginner.So I will try to understand the code.Thanks for the reply. — Shubham, Aug 12 '17 at 09:06
@Shubham I'd encourage you to spend some time with this, as I believe it's the best solution for your question. I've provided the Live Example which you can fork and try out different things with. Let me know if there is anything specific that I can explain to you. — Jonathan Mee, Aug 12 '17 at 22:39

Given an Array of strings how do I Remove Duplicates?

2 Answers2