Finding and printing common letters in different strings

Question

I need to find and print common characters in different strings. My code does not work as it should, it checks for same letters at the same index but thats not what I want. I couldn't find better solution for now. Thank you for help :)

#include <iostream>
#include <string>
using namespace std;

int main() {
    string niz1, niz2, niz3 = "";
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    for (int i = 0; i < niz1.length() - 1; i++) {
        for (int j = 0; j < niz2.length() - 1; j++) {
            if (niz1[i] == niz2[j])
                niz3 += niz1[i];
        }
    }

    cout << "Same letters are: " << niz3 << endl;
    return 0;
}

Here's a simple way to figure out how to do this, and it never fails to work. Just take out a blank sheet of paper. Write down using short, simple sentences in plain English, a step-by-step process of doing this. When done, [call your rubber duck for an appointment](https://en.wikipedia.org/wiki/Rubber_duck_debugging). We don't write code for other people, on Stackoverflow. We always refer such questions to your rubber duck. After your rubber duck approves your proposed plan of action, simply take what you've written down and translate it directly into C++. Mission accomplished! — Sam Varshavchik, Nov 04 '20 at 02:23
Thank you for answering, I tried your solution but it doesnt work for me because I am beginner and cant come up with solution — Boris1, Nov 04 '20 at 02:33
@BorisLoncaric seems like you need a [good c++ textbook](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). — Gary Strivin', Nov 04 '20 at 02:38

Arty · Accepted Answer · 2020-11-04T03:41:37.993

Below is corrected working code. Basically the only correction that was needed to do is to make both loops have upper bound of niz.length() instead of niz.length() - 1.

Variant 1:

Try it online!

#include <string>
#include <iostream>
using namespace std;

int main() {
    string niz1, niz2, niz3 = "";
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    for (int i = 0; i < niz1.length(); i++) {
        for (int j = 0; j < niz2.length(); j++) {
            if (niz1[i] == niz2[j])
                niz3 += niz1[i];
        }
    }

    cout << "Same letters are: " << niz3 << endl;

    return 0;
}

Input:

string 1: adbc
string 2: cde

Output:

Same letters are: dc

Also you may want to sort letters and make them unique, then you need to use std::set too like in code below:

Variant 2:

Try it online!

#include <string>
#include <iostream>
#include <set>
using namespace std;

int main() {
    string niz1, niz2, niz3 = "";
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    for (int i = 0; i < niz1.length(); i++) {
        for (int j = 0; j < niz2.length(); j++) {
            if (niz1[i] == niz2[j])
                niz3 += niz1[i];
        }
    }

    set<char> unique(niz3.begin(), niz3.end());
    niz3.assign(unique.begin(), unique.end());

    cout << "Same letters are: " << niz3 << endl;

    return 0;
}

Input:

string 1: adbcda
string 2: cdecd

Output:

Same letters are: cd

Also you may use just sets plus set_intersection standard function. This will solve your task in less time, in O(N*log(N)) time instead of your O(N^2) time.

Variant 3:

Try it online!

#include <string>
#include <iostream>
#include <set>
#include <algorithm>
#include <iterator>
using namespace std;

int main() {
    string niz1, niz2, niz3 = "";
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    set<char> s1(niz1.begin(), niz1.end()), s2(niz2.begin(), niz2.end());
    set_intersection(s1.begin(), s1.end(), s2.begin(), s2.end(), back_inserter(niz3));

    cout << "Same letters are: " << niz3 << endl;

    return 0;
}

Input:

string 1: adbcda
string 2: cdecd

Output:

Same letters are: cd

Instead of set it is also possible to use unordered_set, it will give even more faster algorithm especially for long strings, algorithm will have running time O(N) compared to O(N * log(N)) for set solution. The only drawback is that unlike for set solution output of unordered_set solution is unsorted (but unique) (unordered sets don't sort their data).

Variant 4:

Try it online!

#include <string>
#include <iostream>
#include <unordered_set>
#include <algorithm>
#include <iterator>
using namespace std;

int main() {
    string niz1, niz2, niz3;
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    unordered_set<char> s1(niz1.begin(), niz1.end()), s2;
    for (size_t i = 0; i < niz2.length(); ++i)
        if (s1.count(niz2[i]))
            s2.insert(niz2[i]);
    niz3.assign(s2.begin(), s2.end());

    cout << "Same letters are: " << niz3 << endl;

    return 0;
}

Input:

string 1: adbcda
string 2: cdecd

Output:

Same letters are: dc

Also one more way is to use just plain for loops like you did, without sets, but do extra block of loops in order to remove non-unique letters, like in code below. The only drawbacks of this loops method compared to sets method is that loops method runs slower and produces non-sorted output string.

Variant 5:

Try it online!

#include <string>
#include <iostream>
using namespace std;

int main() {
    string niz1, niz2, niz3, niz4;
    cout << "string 1: ";
    getline(cin, niz1);
    cout << "string 2: ";
    getline(cin, niz2);

    for (int i = 0; i < niz1.length(); ++i)
        for (int j = 0; j < niz2.length(); ++j)
            if (niz1[i] == niz2[j])
                niz3 += niz1[i];
    
    for (int i = 0; i < niz3.length(); ++i) {
        bool exists = false;
        for (int j = 0; j < niz4.length(); ++j)
            if (niz4[j] == niz3[i]) {
                exists = true;
                break;
            }
        if (!exists)
            niz4 += niz3[i];
    }

    cout << "Same letters are: " << niz4 << endl;

    return 0;
}

Input:

string 1: adbcda
string 2: cdecd

Output:

Same letters are: dc

You shouldn't use `using namespace std;`. Better replace it with `using std::cin;`, `using std::cout`, `using std::string` and `using std::endl`. More info [here](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice). — Gary Strivin', Nov 04 '20 at 02:33
@GaryNLOL In `.cpp` files it is quite alright to use such global `using`, because `.cpp` file don't interfere with other files. It is only in `.h` files that you need to use `std::cout` everywhere because header file may create problems for those `.cpp`s that include them. — Arty, Nov 04 '20 at 02:36
@BorisLoncaric Try second or third versions of code that I posted, they print unique chars, scroll down my post, there are several blocks of code. — Arty, Nov 04 '20 at 02:53
@Arty I tried your code and it works, much thanks for help. I need to learn functions that you used in your code, I only tried to solve it with classic for loop and if statement. Thanks again for help! — Boris1, Nov 04 '20 at 03:00
@BorisLoncaric I just added right now one more solution, called `Variant 5`, at the very end of my answer, it uses not sets or set_intersections, it uses plain loops like you did, but I just added to your solution one more block of loops that remove non-unique letters. You may use that Variant 5 solution if you need just plain loops! — Arty, Nov 04 '20 at 03:42
@BorisLoncaric Also, I just updated my answer and included `Variant 4`, it uses `unordered_set` standard C++ structure, which is the fastest possible solution (takes linear `O(N)` time), just for you to know what is possible, if you need to know this now. — Arty, Nov 04 '20 at 03:44
@Arty: While you may not be able to improve on O(N) computational complexity, faster that your variant 4 code is certainly possible (see my answer for one possibility). — Jerry Coffin, Nov 04 '20 at 09:30

Jerry Coffin · Answer 2 · 2020-11-04T09:29:22.283

This is sort of an answer in itself, and sort of an extended comment on @Arty's answer.

Hash tables (which are what underlies an unordered_map) are really useful under many circumstances. But in this case, they're kind of overkill. In particular, a hash table is basically a way of creating a sparse array for cases where it's unrealistic or unreasonable to use the underlying "key" type directly as an index into an array.

In this case, however, what we're using as the key in the hash table is a character--a single byte. This is small enough, it's utterly trivial to just use an array, and use the byte directly as an index into the array.

So, with arrays instead of hash tables, we get code something on this order:

#include <array>
#include <string>
#include <iostream>
#include <chrono>

std::string getstring(std::string const &s) {
    std::cout << s << ": ";
    std::string input;
    std::getline(std::cin, input);
    return input;
}

using namespace std::chrono;

int main() {

    std::array<char, 256> a = {0};
    std::array<char, 256> b = {0};
    std::array<char, 256> result = { 0 };
    std::size_t pos=0;

    std::string s1 = getstring("s1");
    std::string s2 = getstring("s2");

    std::cout << "s1: " << s1 << "\n";
    std::cout << "s2: " << s2 << "\n";

    auto start = high_resolution_clock::now();

    for (auto c : s1)
        a[c] = 1;
    for (auto c : s2)
        b[c] = 1;

    for (int i = 'a'; i < 'z'; i++)
        if (a[i] != 0 && b[i] != 0)
            result[pos++] = i;
    for (int i = 'A'; i < 'Z'; i++)
        if (a[i] != 0 && b[i] != 0)
            result[pos++] = i;

    auto stop = high_resolution_clock::now();

    std::cout << "Common characters: " << std::string(result.data(), pos) <<"\n";
    std::cout << "Time: " << duration_cast<nanoseconds>(stop - start).count() << " nS\n ";
}

To get some repeatable test conditions, I built an input file with a couple of long fairly strings:

asdffghjllkjpoiuwqertqwerxvzxvcn
qqweroiglkgfpoilkagfskeqwriougfkljzxbvckxzv

After adding instrumentation (timing code) to his Variant 4 code, I found that his code ran in about 12,000 to 12,500 nanoseconds.

The code above, on the other hand, runs (on the same hardware) in about 850 nanoseconds, or around 15 times as fast.

I'm going to go on record as saying the opposite: although this is clearly faster, I'm pretty sure it's not the fastest possible. I can see at least one fairly obvious improvement (store only one bit per character instead of one byte) that would probably improve speed by at least 2x, and could theoretically yield an improvement around 8x or so. Unfortunately, we've already sped other things up enough that I doubt we'd see 8x--we'd probably see a bottleneck on reading in the data from memory first (but it's hard to be sure, and likely to vary between processors). So, I'm going to leave that alone, at least for now. For now, I'll settle for only about fifteen times faster than "the fastest possible solution"... :-)

I suppose, in fairness, he probably really meant asymptotically the fastest. His has (expected, but not guaranteed) O(N) complexity, and mine also has O(N) complexity (but in this case, basically guaranteed, not not just expected. In other words, the 15x is roughly a constant factor, not one we expect to change significantly with the size of the input string. Nonetheless, even if it's "only" a constant factor, 15x is still a pretty noticeable difference in speed.

Thanks for posting your solution! Yes in fact although `O(N)` complexity can't be reduced (because you have to "look" at all letters at least once), but the constant under `O()` notation can be quite large for hash tables, this I always new. Also usually some partial cases solution usually always faster, like you solved task for `char` only strings. Although it wasn't asked by OP to solve anything except `char`, still strings may have unicode inside, i.e. can have 32-bit chars in general. Then for generic solution probably hash table is almost fastest,especially if to choose hash like `xxhash` — Arty, Nov 04 '20 at 09:40
Also just for fun I was thinking about one possible optimization for your algorithm, instead of first two loops you may use just one array and do `array a; for (auto c: s1) a[c] = 1; for (auto c: s2) a[c] |= 2;` then checking loops will be `for (int i = 'a'; i < 'z'; i++) if (a[i] == 3) result[pos++] = i;`. I.e. you'll have one less comparison and memory access and also have just one table instead of two. But this is just a thought, no need to do that as your solution is probably more understandable. — Arty, Nov 04 '20 at 09:59
@Arty: Actually, I started with a single array, but traversing it was adding extra time. A lot here would depend on the lengths of the strings though. If the strings are long (e.g., megabytes), you need to optimize the first phase. If they're shorter (like the few dozen bytes or so I used here) you need to be more careful to balance the first and second phases. — Jerry Coffin, Nov 04 '20 at 16:19

Finding and printing common letters in different strings

2 Answers2