How to get the next prefix in C++?

Question

Given a sequence (for example a string "Xa"), I want to get the next prefix in order lexicographic (i.e "Xb"). The next of "aZ" should be "b"

A motivating use case where this function is useful is described here.

As I don't want to reinvent the wheel, I'm wondering if there is any function in C++ STL or boost that can help to define this generic function easily? If not, do you think that this function can be useful?

Notes

Even if the examples are strings, the function should work for any Sequence.
The lexicographic order should be a template parameter of the function.

From the answers I conclude that there is nothing on C++/Boost that can help to define this generic function easily and also that this function is too specific to be proposed for free. I will implement a generic next_prefix and after that I will request if you find it useful.

I have accepted the single answer that gives some hints on how to do that even if the proposed implementation is not generic.

@GMan: "c", I should think, assuming the default char order. The lexicographically first string after "b", of which "b" is not a prefix. Hence the "next prefix". — Steve Jessop, Apr 23 '10 at 17:25
@dash-tom-bang: You don't. I've a sneaking suspicion that Vicente is asking in the context of http://stackoverflow.com/questions/2689963/container-for-database-like-searches/2690246, where he mentions a `next_prefix` function. — Steve Jessop, Apr 23 '10 at 21:07
I don't think this wheel has been invented yet, so reinventing it won't be an issue. — MSN, Apr 23 '10 at 21:26
@MSN: well, the idea of modifying a prefix like this to turn an initial-substring-search into a matter of order comparison isn't new. It's advised in GAE, for instance. See the "tip" here: http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Introducing_Indexes. Although there is a slight difference that they use "biggest unicode character" and a <= comparison, whereas Vicente wants something to use with a < comparison. What I find alarming is trying to do it based on an arbitrary collation... — Steve Jessop, Apr 23 '10 at 22:05
@Steve, ahh... well then. That's pretty ugly. Although I do have to admit I've done something similar to implement an adjacency graph with a `std::set >`. — MSN, Apr 23 '10 at 22:40
The problem here is that the OP expects the algorithm to change the container's (string's) size. This is uncommon for algorithms. Of course, you _could_ have the algorithm work on a couple of input iterators (probably bidi) and one output iterator, and finally return the end of the result range. I can't tell if this algorithm would be useful because I don't even understand what it's supposed to do. Still, I think a "counter" algorithm like the one I posted below in my answer would be useful (even if I completely misunderstood what the OP wants). — wilhelmtell, Apr 24 '10 at 01:43
@Wilhelm I don't expect the next_prefix function to change its input. it should return a new sequence. — Vicente Botet Escriba, Apr 24 '10 at 09:52

wilhelmtell · Accepted Answer · 2010-04-23T21:33:28.583

3

I'm not sure I understand the semantics by which you wish the string to transform, but maybe something like the following can be a starting point for you. The code will increment the sequence, as if it was a sequence of digits representing a number.

template<typename Bi, typename I>
bool increment(Bi first, Bi last, I minval, I maxval)
{
    if( last == first ) return false;
    while( --last != first && *last == maxval ) *last = minval;
    if( last == first && *last == maxval ) {
        *last = minval;
        return false;
    }
    ++*last;
    return true;
}

Maybe you wish to add an overload with a function object, or an overload or specialization for primitives. A couple of examples:

string s1("aaz");
increment(s1.begin(), s1.end(), 'a', 'z');
cout << s1 << endl;     // aba

string s2("95");
do {
    cout << s2 << ' ';  // 95 96 97 98 99
} while( increment(s2.begin(), s2.end(), '0', '9') );
cout << endl;

edited Apr 23 '10 at 21:33

answered Apr 23 '10 at 21:19

wilhelmtell

57,473
20
96
131

Thanks for providing a possible implementation that modifies the input. My question was if there is any function in C++ STL or boost that can help to define this generic function easily. – Vicente Botet Escriba Apr 24 '10 at 09:59
I'm not sure I know how to simplify the code with STL help. For an `increment_copy()` version you need to add an output iterator parameter. Look in the input for the last element that isn't `maxval`, then copy all the elements from the beginning up until (but excluding) the position you found. Then output the element you found incremented, and finally replace each of the remaining elements to `minval` in the output container. All this implies you read the input twice: which is because you have to output begin-to-end, and you can't write back-to-front like you do in the in-place version. – wilhelmtell Apr 25 '10 at 04:09

score 2 · Answer 2 · answered Apr 23 '10 at 16:54

2

That seem so specific that I can't see how it would get in STL or boost.

answered Apr 23 '10 at 16:54

Klaim

67,274
36
133
188

1

There's std::next_permutation (http://www.cppreference.com/wiki/stl/algorithm/next_permutation) which cycles through existing sequences, so I don't agree that it would be too specific or specialist. – AshleysBrain Apr 23 '10 at 17:05
1

@AshleysBrain: The way in which @Vincente’s sequence has to be cycled is highly specific, though: first character is always lower-case, second always upper-case, second may be missing. – Konrad Rudolph Apr 23 '10 at 17:07
@Konrad I never said that, there were simple examples. The prefix can have any length and as I said is not limited to sequences of chars. – Vicente Botet Escriba Apr 24 '10 at 09:44
@Klain My use case is to find a range of elements on a set starting by "X" – Vicente Botet Escriba Apr 24 '10 at 10:23
+1 @Ashley Yes this is the kind of generic function I'm looking for. – Vicente Botet Escriba Apr 24 '10 at 10:32

Steve Jessop · Answer 3 · 2010-04-23T17:27:04.043

When you say the order is a template parameter, what are you envisaging will be passed? A comparator that takes two characters and returns bool?

If so, then that's a bit of a nightmare, because the only way to find "the least char greater than my current char" is to sort all the chars, find your current char in the result, and step forward one (or actually, if some chars might compare equal, use upper_bound with your current char to find the first greater char).

In practice, for any sane string collation you can define a "get the next char, or warn me if I gave you the last char" function more efficiently, and build your "get the next prefix" function on top of that. Hopefully, permitting an arbitrary order is more flexibility than you need.

I have no set any constraints on how the lexicographic order can be specified. And you are right, it seems that we need a next and is_last function on the elements or something like that. See my comment @Potatoswatter — Vicente Botet Escriba, Apr 24 '10 at 10:24

Potatoswatter · Answer 4 · 2010-04-23T17:33:29.463

0

Orderings are typically specified as a comparator, not as a sequence generator.

Lexicographical orderings in particular tend be only partial, for example, in case or diacritic insensitivity. Therefore your final product will be nondeterministic, or at best arbitrary. ("Always choose lowest numerical encoding"?)

In any case, if you accept a comparator as input, the only way to translate that to an increment operation would be to compare the current value against every other in the character space. Which could work, 127 values being so few (a comparator-sorted table would make short work of the problem), or could be impossibly slow, if you use any other kind of character.

edited Apr 23 '10 at 17:33

answered Apr 23 '10 at 17:21

Potatoswatter

134,909
25
265
421

In the Notes I said "The lexicographic order should be a template parameter of the function" and is not limited to chars "Even if the examples are strings, the function should work for any Sequence." – Vicente Botet Escriba Apr 24 '10 at 10:07
In the case of the next_prefix I think that the lexicographic order parameter could be a function that gives the next element, e.g 'a'->'b', 'Z'->'a' and whether an element is the last one is_last('Z')==true – Vicente Botet Escriba Apr 24 '10 at 10:13

score 0 · Answer 5 · answered Apr 23 '10 at 18:53

The best way is likely to define the character ordering somehow, then define the rules from going from one character to two characters to three characters.

Use whatever sort function you wish to use over the complete list of characters that you want to include, then just use that as the ordering. Find the index of the current character, and you can easily find the previous and next characters. Only advance the right-most character, unless it's going to roll over, then advance the next character to the left.

In other words, reinventing the wheel is like 10 lines of Python. Probably less than 500 lines of C++. :)

How to get the next prefix in C++?

5 Answers5

Linked