0

Given a sequence (for example a string "Xa"), I want to get the next prefix in order lexicographic (i.e "Xb"). The next of "aZ" should be "b"

A motivating use case where this function is useful is described here.

As I don't want to reinvent the wheel, I'm wondering if there is any function in C++ STL or boost that can help to define this generic function easily? If not, do you think that this function can be useful?

Notes

  • Even if the examples are strings, the function should work for any Sequence.
  • The lexicographic order should be a template parameter of the function.

From the answers I conclude that there is nothing on C++/Boost that can help to define this generic function easily and also that this function is too specific to be proposed for free. I will implement a generic next_prefix and after that I will request if you find it useful.

I have accepted the single answer that gives some hints on how to do that even if the proposed implementation is not generic.

Community
  • 1
  • 1
Vicente Botet Escriba
  • 4,305
  • 1
  • 25
  • 39
  • 1
    @GMan: "c", I should think, assuming the default char order. The lexicographically first string after "b", of which "b" is not a prefix. Hence the "next prefix". – Steve Jessop Apr 23 '10 at 17:25
  • How do you get from aZ to Za (or vice versa)? – wilhelmtell Apr 23 '10 at 18:13
  • @SteveJessop then how/when do you get to ba, bA, etc.? – dash-tom-bang Apr 23 '10 at 18:50
  • 1
    @dash-tom-bang: You don't. I've a sneaking suspicion that Vicente is asking in the context of http://stackoverflow.com/questions/2689963/container-for-database-like-searches/2690246, where he mentions a `next_prefix` function. – Steve Jessop Apr 23 '10 at 21:07
  • I don't think this wheel has been invented yet, so reinventing it won't be an issue. – MSN Apr 23 '10 at 21:26
  • @MSN: well, the idea of modifying a prefix like this to turn an initial-substring-search into a matter of order comparison isn't new. It's advised in GAE, for instance. See the "tip" here: http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Introducing_Indexes. Although there is a slight difference that they use "biggest unicode character" and a <= comparison, whereas Vicente wants something to use with a < comparison. What I find alarming is trying to do it based on an arbitrary collation... – Steve Jessop Apr 23 '10 at 22:05
  • @Steve, ahh... well then. That's pretty ugly. Although I do have to admit I've done something similar to implement an adjacency graph with a `std::set >`. – MSN Apr 23 '10 at 22:40
  • The problem here is that the OP expects the algorithm to change the container's (string's) size. This is uncommon for algorithms. Of course, you _could_ have the algorithm work on a couple of input iterators (probably bidi) and one output iterator, and finally return the end of the result range. I can't tell if this algorithm would be useful because I don't even understand what it's supposed to do. Still, I think a "counter" algorithm like the one I posted below in my answer would be useful (even if I completely misunderstood what the OP wants). – wilhelmtell Apr 24 '10 at 01:43
  • @Wilhelm I don't expect the next_prefix function to change its input. it should return a new sequence. – Vicente Botet Escriba Apr 24 '10 at 09:52

5 Answers5

3

I'm not sure I understand the semantics by which you wish the string to transform, but maybe something like the following can be a starting point for you. The code will increment the sequence, as if it was a sequence of digits representing a number.

template<typename Bi, typename I>
bool increment(Bi first, Bi last, I minval, I maxval)
{
    if( last == first ) return false;
    while( --last != first && *last == maxval ) *last = minval;
    if( last == first && *last == maxval ) {
        *last = minval;
        return false;
    }
    ++*last;
    return true;
}

Maybe you wish to add an overload with a function object, or an overload or specialization for primitives. A couple of examples:

string s1("aaz");
increment(s1.begin(), s1.end(), 'a', 'z');
cout << s1 << endl;     // aba

string s2("95");
do {
    cout << s2 << ' ';  // 95 96 97 98 99
} while( increment(s2.begin(), s2.end(), '0', '9') );
cout << endl;
wilhelmtell
  • 57,473
  • 20
  • 96
  • 131
  • Thanks for providing a possible implementation that modifies the input. My question was if there is any function in C++ STL or boost that can help to define this generic function easily. – Vicente Botet Escriba Apr 24 '10 at 09:59
  • I'm not sure I know how to simplify the code with STL help. For an `increment_copy()` version you need to add an output iterator parameter. Look in the input for the last element that isn't `maxval`, then copy all the elements from the beginning up until (but excluding) the position you found. Then output the element you found incremented, and finally replace each of the remaining elements to `minval` in the output container. All this implies you read the input twice: which is because you have to output begin-to-end, and you can't write back-to-front like you do in the in-place version. – wilhelmtell Apr 25 '10 at 04:09
2

That seem so specific that I can't see how it would get in STL or boost.

Klaim
  • 67,274
  • 36
  • 133
  • 188
0

When you say the order is a template parameter, what are you envisaging will be passed? A comparator that takes two characters and returns bool?

If so, then that's a bit of a nightmare, because the only way to find "the least char greater than my current char" is to sort all the chars, find your current char in the result, and step forward one (or actually, if some chars might compare equal, use upper_bound with your current char to find the first greater char).

In practice, for any sane string collation you can define a "get the next char, or warn me if I gave you the last char" function more efficiently, and build your "get the next prefix" function on top of that. Hopefully, permitting an arbitrary order is more flexibility than you need.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • I have no set any constraints on how the lexicographic order can be specified. And you are right, it seems that we need a next and is_last function on the elements or something like that. See my comment @Potatoswatter – Vicente Botet Escriba Apr 24 '10 at 10:24
0

Orderings are typically specified as a comparator, not as a sequence generator.

Lexicographical orderings in particular tend be only partial, for example, in case or diacritic insensitivity. Therefore your final product will be nondeterministic, or at best arbitrary. ("Always choose lowest numerical encoding"?)

In any case, if you accept a comparator as input, the only way to translate that to an increment operation would be to compare the current value against every other in the character space. Which could work, 127 values being so few (a comparator-sorted table would make short work of the problem), or could be impossibly slow, if you use any other kind of character.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • In the Notes I said "The lexicographic order should be a template parameter of the function" and is not limited to chars "Even if the examples are strings, the function should work for any Sequence." – Vicente Botet Escriba Apr 24 '10 at 10:07
  • In the case of the next_prefix I think that the lexicographic order parameter could be a function that gives the next element, e.g 'a'->'b', 'Z'->'a' and whether an element is the last one is_last('Z')==true – Vicente Botet Escriba Apr 24 '10 at 10:13
0

The best way is likely to define the character ordering somehow, then define the rules from going from one character to two characters to three characters.

Use whatever sort function you wish to use over the complete list of characters that you want to include, then just use that as the ordering. Find the index of the current character, and you can easily find the previous and next characters. Only advance the right-most character, unless it's going to roll over, then advance the next character to the left.

In other words, reinventing the wheel is like 10 lines of Python. Probably less than 500 lines of C++. :)

dash-tom-bang
  • 17,383
  • 5
  • 46
  • 62