How to count recurring characters at the beginning of a QString?

Question

Im dealing with a list of lines, and I need to count the hashes that occur at the beginning.

#  item 1
## item 1, 1
## item 1, 2
#  item 2

and so on.

If each line is a QString, how can I return the number of hashes occurring at the beginning of the string?

QString s("### foo # bar ");
int numberOfHashes = s.count("#"); // Answer should be 3, not 4

Paul Belanger · Answer 1 · 2018-08-29T00:43:50.397

Here I use the standard algorithm find_if_not to get an iterator to the first character that is not a hash. I then return the distance from the start of the string to that iterator.

int number_of_hashes(QString const& s)
{
    auto it = std::find_if_not(std::begin(s), std::end(s), [](QChar c){return c == '#';});
    return std::distance(std::begin(s), it);
}

EDIT: the find_if_not function only takes a unary predicate, not a value, so you have to pass a lambda predicate.

Matteo Italia · Accepted Answer · 2018-08-28T23:48:50.323

Trivially:

int number_of_hashes(const QString &s) {
    int i, l = s.size();
    for(i = 0; i < l && s[i] == '#'; ++i);
    return i;
}

In other languages (mostly interpreted ones) you have to fear iteration over characters as it's slow, and delegate everything to library functions (generally written in C). In C++ iteration is perfectly fine performance-wise, so a down-to-earth for loop will do.

Just for fun, I made a small benchmark comparing this trivial method with the QRegularExpression one from OP, possibly with the RE object cached.

#include <QCoreApplication>
#include <QString>
#include <vector>
#include <QElapsedTimer>
#include <stdlib.h>
#include <iostream>
#include <QRegularExpression>

int number_of_hashes(const QString &s) {
    int i, l = s.size();
    for(i = 0; i < l && s[i] == '#'; ++i);
    return i;
}

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    const int count = 100000;
    std::vector<QString> ss;
    for(int i = 0; i < 100; ++i) ss.push_back(QString(rand() % 10, '#') + " foo ## bar ###");
    QElapsedTimer t;
    t.start();
    unsigned tot = 0;
    for(int i = 0; i < count; ++i) {
        for(const QString &s: ss) tot += number_of_hashes(s);
    }
    std::cerr<<"plain loop: "<<t.elapsed()*1000./count<<" ns\n";
    t.restart();
    for(int i = 0; i < count; ++i) {
        for(const QString &s: ss) tot += QRegularExpression("^[#]*").match(s).capturedLength();
    }
    std::cerr<<"QRegularExpression, rebuilt every time: "<<t.elapsed()*1000./count<<" ns\n";

    QRegularExpression re("^[#]*");
    t.restart();
    for(int i = 0; i < count; ++i) {
        for(const QString &s: ss) tot += re.match(s).capturedLength();
    }
    std::cerr<<"QRegularExpression, cached: "<<t.elapsed()*1000./count<<" ns\n";
    return tot;    
}

As expected, the QRegularExpression-based one is two orders of magnitude slower:

plain loop: 0.7 ns
QRegularExpression, rebuilt every time: 75.66 ns
QRegularExpression, cached: 24.5 ns

Could you please add my idea to the benchmark as well. You know, out of pure curiosity and just for fun. — scopchanov, Aug 29 '18 at 09:34

Remy Lebeau · Answer 3 · 2018-08-28T19:29:24.297

3

int numberOfHashes = 0;
int size = s.size();
QChar ch('#');
for(int i = 0; (i < size) && (s[i] == ch); ++i) {
    ++numberOfHashes;
}

edited Aug 28 '18 at 19:29

answered Aug 28 '18 at 19:20

Remy Lebeau

555,201
31
458
770

score 3 · Answer 4 · edited Aug 29 '18 at 04:37

3

Solution without a for-loop:

QString s("### foo # bar ");
int numberOfHashes = QRegularExpression("^[#]*").match(s).capturedLength();

edited Aug 29 '18 at 04:37

scopchanov

7,966
10
40
68

answered Aug 28 '18 at 20:41

Anon

2,267
3
34
51

2

Invoking the wrath of regular expressions for such a simple problem is probably overkill. – Paul Belanger Aug 28 '18 at 20:57
2

Compared to a plain ad-hoc loop this is probably several orders of magnitude slower. Whether this is important in your use case, only you know, although, given that the "manual" solution is of similar length/effort to write and is orders of magnitude faster I wouldn't even consider an alternative; sloppy abuse of the processing power we have today is probably why most software is always slow, even though hardware gets faster. Still, if you are doing it on many lines at least cache the `QRegularExpression` object! – Matteo Italia Aug 28 '18 at 23:23
1

@Akiva: I added a small benchmark to my answer; as expected, your solution is ~100 times slower. – Matteo Italia Aug 28 '18 at 23:49

score 2 · Answer 5 · answered Aug 28 '18 at 20:11

2

Yet another way:

int beginsWithCount(const QString &s, const QChar c) {
  int n = 0;
  for (auto ch : s)
    if (c == ch) n++; else break;
  return n;
}

answered Aug 28 '18 at 20:11

Kuba hasn't forgotten Monica

95,931
16
151
313

dfrib · Answer 6 · 2018-08-28T19:36:28.670

A Qt approach, making use of QString::indexOf(..):

QString s("### foo # bar ");
int numHashes = 0;

while ((numHashes = s.indexOf("#", numHashes)) == numHashes) {
    ++numHashes;
} // numHashes == 3

int QString::indexOf(const QString &str, int from = 0, 
                     Qt::CaseSensitivity cs = Qt::CaseSensitive) const
Returns the index position of the first occurrence of the string str in this string, searching forward from index position from. Returns -1 if str is not found.

Starting at index 0, the string s is searched for the first occurrence of #, and thereafter use a predicate to test whether this occurrence is at index 0. If not terminated, proceeds with index 1, and so on.

This will not short-circuit a final possibly full string search, however. In case a hash is not found at its expected position, prior to the final failing predicate check, the string will be searched fully (or until first hash at wrong position) a single time.

The search is unnecessary: we only care about the beginning characters. The search raises the cost from O(1) to O(N) and is a pessimization. — Kuba hasn't forgotten Monica, Aug 28 '18 at 20:13
@KubaOber as I have already explicitly pointed out in the last paragraph of my answer. I still think a `QString` approach can have a merit in this Q&A, naturally including the last paragraph. — dfrib, Aug 28 '18 at 20:43

How to count recurring characters at the beginning of a QString?

6 Answers6

Linked