The best way to find number of occurrences of word in string (C++, no spaces in string)

Question

Let's say you are given a string "banana".

You want to find out how many times "ana" can be found in "banana". So, that number is 2.

string s = "banana";
int num = 0,pos = 0;
pos = s.find("ana");
while(pos!=string::npos) {
    num++;
    pos = s.find("ana",pos+1);
}
cout<<num<<endl;

The thing is, I want to write shorter code for this. What functions can be used ? I tried using search() but it's not what I wanted. Count() is only for characters.

Are there any other functions that can help me with doing this ? (Boost is not allowed on competitions, so not this one).

This is about as good as you could ask for. I suppose a you could use a regex to remove a line of code or two but regexes can be pretty heavy. — NathanOliver, May 14 '19 at 15:12
Competitions :) . what competitions? where? if you tell me I might not give you the answer but instead post it to the competitions for my self :) Also shorter code does not necessarily mean faster program! — AKL, May 14 '19 at 15:14
Shouldn't it be like```pos = s.find("ana", pos + 2);``` since "ana" is what you are looking for not "nan" ? — AKL, May 14 '19 at 15:28
I would be surprised if a competition permitted contestants to obtain outside assistance. — Raymond Chen, May 14 '19 at 16:13
Your question is already asked. Check this one out https://stackoverflow.com/questions/22489073/counting-the-number-of-occurrences-of-a-string-within-a-string — yara raffoul, May 14 '19 at 17:00
I think the best way in terms of speed would be to use [Boyer–Moore string-search algorithm](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm). You search for the best way in terms of what? — KamilCuk, May 15 '19 at 06:46
@SomeName I wrote a new answer. If it solved the problem please consider voting/accepting it! (https://stackoverflow.com/help/someone-answers) — AKL, May 16 '19 at 09:33

score 1 · Accepted Answer · answered May 16 '19 at 04:01

Assuming that finally you intend to also count any possible replication of a phrase, here are 3 generic function templates which work for almost any container or pointer/array , providing that they are given the right first/last iterator instead of container!

#include <iterator>
#include <algorithm>
#include <map>
#include <vector>

//finding the occurrence for an specific case within a range
template<typename rng_t__, typename dif_t__ = typename std::iterator_traits<rng_t__>::difference_type>
dif_t__ occurrence(const rng_t__ rng_fst_, const rng_t__ rng_lst_, const rng_t__ schd_fst_, const rng_t__ schd_lst){
    dif_t__ counter = 0;
    for (rng_t__ it = rng_fst_; (it = std::search(it, rng_lst_, schd_fst_, schd_lst))++ != rng_lst_; ++counter);
    return counter;
}
//finding the replications for all subsets with certain length within a range
template<typename rng_t__, typename dif_t__ = typename std::iterator_traits<rng_t__>::difference_type, typename val_t__ = typename std::iterator_traits<rng_t__>::value_type>
dif_t__ replications(const rng_t__ rng_fst_, const rng_t__ rng_lst_, dif_t__ lnt_){
    if(!lnt_ or lnt_ >= std::distance(rng_fst_, rng_lst_)) return 0;
    rng_t__ it_lst = rng_fst_;
    for (--lnt_; lnt_--; ++it_lst);
    std::map<std::vector<val_t__>, dif_t__> cases;
    for (rng_t__ it_fst = rng_fst_; it_lst++ != rng_lst_; ++it_fst){
        auto it_rslt_pair = cases.insert({{it_fst, it_lst}, 0});
        if(! it_rslt_pair.second) ++(it_rslt_pair.first->second);
    }
    dif_t__ counter = 0;
    for (const auto& a_case : cases) counter += a_case.second;
    return counter;
}
//finding the replications for all subsets with all possible lengths within a range
template<typename rng_t__, typename dif_t__ = typename std::iterator_traits<rng_t__>::difference_type, typename val_t__ = typename std::iterator_traits<rng_t__>::value_type>
dif_t__ replications(const rng_t__ rng_fst_, const rng_t__ rng_lst_){
    const dif_t__ rng_lnt = std::distance(rng_fst_, rng_lst_);
    dif_t__ counter = 0;
    for (dif_t__ a_lnt = 0; ++a_lnt < rng_lnt; counter += replications(rng_fst_, rng_lst_, a_lnt));
    return counter;
}

#include <string>

int main(int argc, char** argv) {

    std::string range = "banana", searched = "ana";

    std::cout<< "total occurrence for the ana" << std::endl;
    std::cout<< occurrence("banana", "banana" + 6, "ana", "ana" +3) << std::endl;
    std::cout<< occurrence(range.begin(), range.end(), searched.begin(), searched.end()) << std::endl;

    std::cout<< "total replications for every phrase from banana with length of 3" << std::endl;
    std::cout<< replications("banana", "banana" + 6, 3) << std::endl;
    std::cout<< replications(range.begin(), range.end(), 3) << std::endl;

    std::cout<< "total replications for every phrase from banana with every possible length" << std::endl;
    std::cout<< replications("banana", "banana" + 6) << std::endl;
    std::cout<< replications(range.begin(), range.end()) << std::endl;

    return 0;
}

possible output:

total occurrence for the ana
2
2
total replications for every phrase from banana with length of 3
1
1
total replications for every phrase from banana with every possible length
6
6

Good luck with the competition!

Paul Evans · Answer 2 · 2019-05-14T16:24:07.177

0

I want to write shorter code for this

You want to use while (true) and break:

std::string s{"banana"};
std::string::size_type pos{0};
int num{0};
while (true) {
    pos = s.find("ana", pos);
    if (pos == std::string::npos) break;
    pos += 2; // next possible place for an "ana";
    ++num;
} 
std::cout << num << "\n";

edited May 14 '19 at 16:24

answered May 14 '19 at 15:26

Paul Evans

27,315
3
37
54

Is using ```while(true)``` and ```break``` really faster? – AKL May 14 '19 at 15:33
@AKL - only way to tell is inspect the assembly or measure the duration. – 2785528 May 14 '19 at 15:33
@2785528 Thank you for your wise, broad and typical answer! what I really meant was, why should it be faster? – AKL May 14 '19 at 16:03
@AKL OP didn't ask for faster code, OP asked for shorter code. This way uses only one line for `find`. – Paul Evans May 14 '19 at 16:26
why should it be faster? I don't know that it should. My experience is that your desktop (and mine) is a complex system. The size or nature of the "code change" has no relation to "performance change". The system is non-linear. – 2785528 May 14 '19 at 16:49
@2785528 what you are experiencing my friend, is CPU being warmed up during raising up their clock speed from different conditions. Also not to forget Operating systems change the hard-working core/s to avoid them being overheated. And what I meant was a significant change in speed at almost equal conditions! If one claims something is faster one should be able to explain it in term of logic or at least repeated experience! – AKL May 14 '19 at 20:27

score 0 · Answer 3 · answered May 14 '19 at 16:29

0

You can use a for loop for this:

string s = "banana";
int num = 0;

for (auto pos = s.find("ana"); pos != string::npos; pos = s.find("ana",pos+1))
    num++;
cout << num << endl;

As you can see, I used auto as type for pos, so it will have the correct type std::string::size_type.

answered May 14 '19 at 16:29

mch

9,424
2
28
42

Thank you for repeating my answer! Would you also be willing to take a down-vote from @AlanBirtles too? :) First _ mine is even shorter! (i am talking about the code). Second _ you spend half of the amount of letters that you saved by using auto with repeating pos in finding and equality check! – AKL May 14 '19 at 16:37
Also why is every body using post-increment ```num++``` instead of pre-increment ```++num``` which is faster and has less overhead? – AKL May 14 '19 at 16:41
@AKL In my opinion it is the best solution regarding the readablitily and the shortness of the code. Also using `num++;` or `++num;` here will end up in the same assembler code: https://godbolt.org/z/GLwgEj – mch May 14 '19 at 16:50
1- since it is "readability" not "readablitily" I am not sure how good you can judge readability! 2- readability was not what the questioner asked for. 3-you guys are so obsessed with assembler and optimizer that became lazy. Although it might not matter in this case, nothing beats developing a good habit. 4- please go ask AlanBirtles to remove his down-vote:) – AKL May 14 '19 at 17:04
@AKL this isn't the same as your answer and doesn't have the same bug – Alan Birtles May 14 '19 at 17:41

Alan Birtles · Answer 4 · 2019-05-15T12:03:04.460

0

Probably the simplest code without sacrificing readability is something like this:

std::string s = "banana";
int num = 0;
size_t pos = 0;

while ((pos = s.find("ana", pos)) != std::string::npos)
{
    num++;
    pos++;
}
std::cout << num << "\n";

or as a for loop:

std::string s = "banana";
int num = 0;
for (size_t pos = 0; (pos = s.find("ana", pos)) != std::string::npos; num++, pos++)
{
}
std::cout << num << "\n";

edited May 15 '19 at 12:03

answered May 15 '19 at 07:16

Alan Birtles

32,622
4
31
60

Dude you were so obsessed with testing the "anana" case that you wrote it for the loop case. Since it was under 6 characters I couldn't edit it for you! – AKL May 15 '19 at 08:58
And speaking of different cases, there is a possibility(no matter how much small) that ```std::string::npos``` is of an integer type of smaller/greater than the size_t and since it is usually filled with -1, your code can also contain an infinite loop if one of string::size_type or size_t is unsigned where one of them is smaller than int and the other is not. There is a reason why size_type is defined in string. Also should I start a Post on when to use post increment operators? – AKL May 15 '19 at 10:03
@AKL `std::string::size_type` is always `std:size_t`. Post increment operators are not generally more expensive for integer types, they only become (potentially) more expensive for structures like iterators, the compiler optimises away the difference – Alan Birtles May 15 '19 at 12:06
As far as my readings ```size_type``` in ```string``` is defined by the ```size_type``` of its ```Allocator``` and one can choose to use one other that default ```std::allocator```. But maybe I am wrong and not that I doubt you but only to be able to prove this to someone else if it ever happens, I would appreciate it if you could kindly provide me a reference for "std::string::size_type is always std:size_t". – AKL May 15 '19 at 12:36
I agree with you about post increment operations, it is just about developing a good habit, not having double standards and readability of the code. Because of reading not authentic materials, I was not even aware of pre increment operator for a long time! – AKL May 15 '19 at 12:36
`std::string`'s allocator is always `std::allocator` who's `size_type` is `std::size_t` – Alan Birtles May 15 '19 at 13:25

The best way to find number of occurrences of word in string (C++, no spaces in string)

4 Answers4