string manipulation - character counting

Question

I have a string S = "&|&&|&&&|&" where we should get the number of '&' between 2 indexes of the string. So the output with the 2 indexes 1 and 8 here should be 5. And here's my brute force style code:

std::size_t cnt = 0;
for(i = start; i < end; i++) {
    if (S[i] == '&')
        cnt++;
}
cout << cnt << endl;

The problem I faced was my code was getting timed out for larger inputs in a coding platform. Can anyone suggest a better way to reduce the time complexity here?

This reads like a typical puzzle from some online quiz/hacker site. If your goal is to learn C++, you will not learn anything here. In nearly all cases, like this one, the correct solution requires knowing some kind of a mathematical or a programming trick. If you don't know what the trick is, and attempt to code a brute-force approach, your program runs forever, and fails for that reason. If you're trying to learn C++, you won't learn anything from useless online quiz sites [but only from a good C++ book](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). — Sam Varshavchik, Feb 13 '20 at 03:25
you can use [segment trees](https://en.wikipedia.org/wiki/Segment_tree) for this problem — Jefferson Rondan, Feb 13 '20 at 03:29
If you have a single query, then this seems to be the best you're gonna get since you can't really assume the character value of any one position. If you have multiple queries, then what you can do is pregenerate an array with the counts up to each index and have the query be constant time. — Misha Melnyk, Feb 13 '20 at 03:30
I think there may be more conditions, e.g., a string consists of two characters `&` and `|`. Or, majority consists of `&`? — ghchoi, Feb 13 '20 at 03:31
@MishaMelnyk , multiple queries is exactly what I have. Could you please explain a bit more to a cpp newbie. Thanks. — newbie, Feb 13 '20 at 03:34
I would try to remove `if` statement because it is much more expensive simple arithmatic operations? I think simple streaming instead of `for` loop would be better as well. — ghchoi, Feb 13 '20 at 03:42
This idea isn't c++ specific, it applies to these kinds of problems in general. By generate the array, I mean go through the string and find how many of a certain character, let's say `'&'`, are in the string before it. The first element, index 0, starts off as either 0 if the first character is '|' or 1 if '&'. After, go through each index 1 <= i < S.length() and check that one. In the array, the value at that index would be equal to the value at i-1 if the character at i is '|' or (value at i-1) + 1 if '&'. Then you can just query by subtracting, and adjust if endpoints are '&' — Misha Melnyk, Feb 13 '20 at 03:42

score 2 · Answer 1 · edited Feb 14 '20 at 07:22

I decided to try several approaches, including the ones proposed by the other two answers to this question. I made several assumptions about the input, with the goal to find a fast implementation for a single large string that would only be searched once for a single character. For a string that will have multiple queries made against it for more than one character, I suggest building a segment tree as suggested in a comment by user Jefferson Rondan.

I used std::chrono::steady_clock::now() to measure implementation times.

Assumptions

The program prompts the user for a string size, search character, start index, and end index.
The inputs are well formed (start <= end <= size).
The string is randomly generated from a uniform distribution of ascii characters between ' ' and '~'.
The underlying data in the string object is stored contiguously in memory.

Approaches

Naive for loop: an index variable is incremented, and the string is indexed, character by character, using the index.
Iterator loop: a string iterator is used, dereferenced at each iteration, and compared to the search character.
Underlying data pointer: a pointer to the underlying character array of the string is found, and this is incremented in a loop. The dereferenced pointer is compared to the search character.
Index mapping (as suggested by GyuHyeon Choi): An int-type array of max printable ascii character elements is initialized to 0, and for each character encountered while iterating through the array, that corresponding index is incremented by one. At the end, the index of the search character is dereferenced to find how many of that character were found.
Just use std::count (as suggested by Atul Sharma): Just use the builting counting functionality.
Recast the underlying data as a pointer to a larger data type and iterate: the underlying const char* const pointer that holds the string data is reinterpreted as a pointer to a wider data type (in this case a pointer to type uint64_t). Each dereferenced uint64_t is then XOR'ed with a mask made up of the search character, and each byte of the uint64_t masked with 0xff. This reduces the number of pointer increments needed to step through the entire array.

Results

For a 1,000,000,000 size string searching from index 5 to 999999995, the results of each method follow:

Naive for loop: 843 ms
Iterator loop: 818 ms
Underlying data pointer: 750 ms
Index mapping (as suggested by GyuHyeon Choi): 929 ms
Just use std::count (as suggested by Atul Sharma): 819 ms
Recast the underlying data as a pointer to a larger data type and iterate: 664 ms

Discussion

The best performing implementation was my own data pointer recast, which completed in a little over 75% of the time it took for the naive solution. The fastest "simple" solution is pointer iteration over the underlying data structure. This method has the benefit of being easy to implement, understand, and maintain. The index mapping method, despite being marketed as 2x faster than the naive solution, didn't see such speedups on my benchmarks. The std::count method is about as fast as the by-hand pointer iteration, and even simpler to implement. If speed really matters, consider recasting the underlying pointer. Otherwise, stick with std::count.

The Code

#include <algorithm>
#include <iostream>
#include <random>
#include <string>
#include <functional>
#include <typeinfo>
#include <chrono>

int main(int argc, char** argv)
{
    std::random_device device;
    std::mt19937 generator(device());
    std::uniform_int_distribution<short> short_distribution(' ', '~');
    auto next_short = std::bind(short_distribution, generator);

    std::string random_string = "";
    size_t string_size;
    size_t start_search_index;
    size_t end_search_index;
    char search_char;
    std::cout << "String size: ";
    std::cin >> string_size;
    std::cout << "Search char: ";
    std::cin >> search_char;
    std::cout << "Start search index: ";
    std::cin >> start_search_index;
    std::cout << "End search index: ";
    std::cin >> end_search_index;

    if (!(start_search_index <= end_search_index && end_search_index <= string_size))
    {
        std::cout << "Requires start_search <= end_search <= string_size\n";
        return 0;
    }

    for (size_t i = 0; i < string_size; i++)
    {
        random_string += static_cast<char>(next_short());
    }

    // naive implementation
    size_t count = 0;
    auto start_time = std::chrono::steady_clock::now();
    for (size_t i = start_search_index; i < end_search_index; i++)
    {
        if (random_string[i] == search_char)
            count++;
    }
    auto end_time = std::chrono::steady_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
    std::cout << "Naive implementation. Found: " << count << "\n";
    std::cout << "Elapsed time: " << duration.count() << "us.\n\n";

    // Iterator solution
    count = 0;
    start_time = std::chrono::steady_clock::now();
    for (auto it = random_string.begin() + start_search_index, end = random_string.begin() + end_search_index;
        it != end;
        it++)
    {
        if (*it == search_char)
            count++;
    }
    end_time = std::chrono::steady_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
    std::cout << "Iterator solution. Found: " << count << "\n";
    std::cout << "Elapsed time: " << duration.count() << "us.\n\n";

    // Iterate on data
    count = 0;
    start_time = std::chrono::steady_clock::now();
    for (auto it = random_string.data() + start_search_index,
        end = random_string.data() + end_search_index;
        it != end; it++)
    {
        if (*it == search_char)
            count++;
    }
    end_time = std::chrono::steady_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
    std::cout << "Iterate on underlying data solution. Found: " << count << "\n";
    std::cout << "Elapsed time: " << duration.count() << "us.\n\n";

    // use index mapping
    count = 0;
    size_t count_array['~']{ 0 };
    start_time = std::chrono::steady_clock::now();
    for (size_t i = start_search_index; i < end_search_index; i++)
    {
        count_array[random_string.at(i)]++;
    }
    end_time = std::chrono::steady_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
    count = count_array[search_char];
    std::cout << "Using index mapping. Found: " << count << "\n";
    std::cout << "Elapsed time: " << duration.count() << "us.\n\n";

    // using std::count
    count = 0;
    start_time = std::chrono::steady_clock::now();
    count = std::count(random_string.begin() + start_search_index
        , random_string.begin() + end_search_index
        , search_char);
    end_time = std::chrono::steady_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
    std::cout << "Using std::count. Found: " << count << "\n";
    std::cout << "Elapsed time: " << duration.count() << "us.\n\n";

    // Iterate on larger type than underlying char
    count = end_search_index - start_search_index;
    start_time = std::chrono::steady_clock::now();
    // Iterate through underlying data until the address is modulo 4
    {
        auto it = random_string.data() + start_search_index;
        auto end = random_string.data() + end_search_index;

        // iterate until we reach a pointer that is divisible by 8
        for (; (reinterpret_cast<std::uintptr_t>(it) & 0x07) && it != end; it++)
        {
            if (*it != search_char)
                count--;
        }

        // iterate on 8-byte sized chunks until we reach the last full chunk that is 8-byte aligned
        auto chunk_it = reinterpret_cast<const uint64_t* const>(it);
        auto chunk_end = reinterpret_cast<const uint64_t* const>((reinterpret_cast<std::uintptr_t>(end)) & ~0x07);
        uint64_t search_xor_mask = 0;
        for (size_t i = 0; i < 64; i+=8)
        {
            search_xor_mask |= (static_cast<uint64_t>(search_char) << i);
        }
        constexpr uint64_t all_ones = 0xff;
        for (; chunk_it != chunk_end; chunk_it++)
        {
            auto chunk = (*chunk_it ^ search_xor_mask);
            if (chunk & (all_ones << 56))
                count--;
            if (chunk & (all_ones << 48))
                count--;
            if (chunk & (all_ones << 40))
                count--;
            if (chunk & (all_ones << 32))
                count--;
            if (chunk & (all_ones << 24))
                count--;
            if (chunk & (all_ones << 16))
                count--;
            if (chunk & (all_ones <<  8))
                count--;
            if (chunk & (all_ones <<  0))
                count--;
        }

        // iterate on the remainder of the bytes, should be no more than 7, tops
        it = reinterpret_cast<decltype(it)>(chunk_it);
        for (; it != end; it++)
        {
            if (*it != search_char)
                count--;
        }
    }
    end_time = std::chrono::steady_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
    std::cout << "Iterate on underlying data with larger step sizes. Found: " << count << "\n";
    std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
}

Example Output

String size: 1000000000
Search char: &
Start search index: 5
End search index: 999999995
Naive implementation. Found: 10527454
Elapsed time: 843179us.

Iterator solution. Found: 10527454
Elapsed time: 817762us.

Iterate on underlying data solution. Found: 10527454
Elapsed time: 749513us.

Using index mapping. Found: 10527454
Elapsed time: 928560us.

Using std::count. Found: 10527454
Elapsed time: 819412us.

Iterate on underlying data with larger step sizes. Found: 10527454
Elapsed time: 664338us.

Native `for` with `if` statement in every loop is faster than array access? Seriously? — ghchoi, Feb 14 '20 at 02:21
@GyuHyeonChoi according to my benchmarking, yes. If you can find a reason why my code is not as fast as yours, I'd be happy to hear it since I'm a novice, myself. — JohnFilleau, Feb 14 '20 at 02:23
I edited your code. Please re-benchmark it. BTW, up-vote for your last solution...! — ghchoi, Feb 14 '20 at 03:07

ghchoi · Answer 2 · 2020-02-14T02:45:17.903

1

int cnt[125];  // ASCII '&' = 46, '|' = 124
cnt['&'] = 0;
for(int i = start; i < end; i++) {
    cnt[S.at(i)]++;
}
cout << cnt['&'] << endl;

if is expensive as it compares and branches. So it would be better.

edited Feb 14 '20 at 02:45

answered Feb 13 '20 at 04:09

ghchoi

4,812
4
30
53

is this really faster compared to the original code? – lenik Feb 13 '20 at 04:42
Yes, I measured elapsed time a few times and it was more than 2 times faster. – ghchoi Feb 13 '20 at 04:44

score 0 · Answer 3 · answered Feb 13 '20 at 04:27

0

You can use the std::count from algorithm standard C++ library. Just include the header <algorithm>

std::string s{"&|&&|&&&|&"};
// https://en.cppreference.com/w/cpp/algorithm/count
auto const count = std::count(s.begin() + 1  // starting index
                             ,s.begin() + 8  // one pass end index 
                             ,'&');

answered Feb 13 '20 at 04:27

Atul Sharma

9
3

The author asked for a solution with better performance. As long as `std::count` takes the target as an argument, it requires comparisons... The function internally does what the author already did. Your solution is about two times slower than what the author already did. – ghchoi Feb 13 '20 at 04:41