Count number of different characters between two strings of equal length

Question

I have got two strings of equal length. Each string contains digits '1' to '9'.

I want to calculate the number of indexes where the character at that index is same between the two strings.

Example:

A = "1322113" and
B = "2312213"

then the output should be 4 as characters at 1st, 3rd, 5th and 6th index are same (considering 0 based indexing).

I know about the naive solution of iterating and checking {O(n)}. Is there any library or technique which can give me better time complexity?

Quantum computers have better complexity for array lookup, perhaps they can be of help here. Otherwise, in a non-indexed container, which a plain string is, no luck. — bipll, Apr 07 '20 at 09:35

score 2 · Answer 1 · answered Apr 07 '20 at 09:17

You cannot obtain a better complexity than the size of the input, in this case N characters. Since you need to check all N characters (as each of them can be a potential solution) the linear O(N) solution of checking all of them is already optimal.

Hope this helps.

score 1 · Answer 2 · edited Oct 08 '22 at 17:27

The optimal solution is O( n ) because in any case you have to traverse the both strings to apply an operation to corresponding characters or to check the result of an applied operation to the strings.

As for the concrete solution of the task I can suggest to use the standard algorithm std::inner_product.

For example

#include <iostream>
#include <string>
#include <functional>
#include <iterator>
#include <numeric>

int main() 
{
    std::string s1 = "1322113";
    std::string s2 = "2312213";
    
    auto n = std::inner_product( std::begin( s1 ), std::end( s1 ),
                                 std::begin( s2 ),
                                 size_t( 0 ),
                                 std::plus<>(),
                                 std::equal_to<>() );
                                 
    std::cout << "n = " << n << '\n';
    
    return 0;
}

The program output is

n = 4

If you want to count the number of non-equal characters in the same positions then instead of the function object std::equal_to use std::not_equal_to

score 0 · Answer 3 · answered Apr 07 '20 at 10:36

0

If you want to get better results, then you should consider different approaches for string's storage. The first optimization I see is to keep string as pieces of numbers. Then you can iterate over such numbers, and compare them (if numbers are small, you can use precomputed table of results). The benefits of such approach is too vague (O(N) complexity will become (O(N / constant) ~ O(N)). But for educational purposes you can try to improve algorithm

answered Apr 07 '20 at 10:36

Alex Aparin

4,393
5
25
51

1

Your suggestion is basically equivalent to using SIMD over character arrays, checking 16 or 32 positions at once. But with SIMD you don't need lookup tables, just a SIMD compare producing a vector of 0 / -1 elements, so you can do `counts = _mm_sub_epi8(counts, _mm_cmpeq_epi8(v1, v2));` (for x86 SSE2 for example). See [How to count character occurrences using SIMD](https://stackoverflow.com/q/54541129) (replace the fixed vector with a load from another string) and [Can counting byte matches between two strings be optimized using SIMD?](https://stackoverflow.com/a/15599426) (already using 2) – Peter Cordes Apr 04 '22 at 02:23

Count number of different characters between two strings of equal length

3 Answers3