5

I'm trying to change user input in wildcard form ("*word*") to a regular expression format. To that end, I'm using the code below to strip off the '*' at the beginning and end of the input so that I can add the regular expression characters on either end:

string::iterator    iter_begin = expressionBuilder.begin();
string::iterator    iter_end = expressionBuilder.end();
iter_end--;
if ((char)*iter_begin == '*' && (char)*iter_end == '*')
{
    expressionBuilder.erase(iter_begin);
    expressionBuilder.erase(iter_end);
    expressionBuilder = "\\b\\w*" + expressionBuilder + "\\w*\\b";
}

However, the call to "expressionBuilder.erase(iter_end)" does not erase the trailing '*' from the input string so I wind up with an incorrect regular expression. What am I doing wrong here? "(char)*iter_end == '*'" must be true for the code inside the if statment to run (which it does), so why doesn't the same iterator work when passed to erase()?

jeffm
  • 3,120
  • 1
  • 34
  • 57

4 Answers4

7

Your original code and the proposed solutions so far have a couple of problems in addition to the obvious problem you posted about:

  • use of invalidated iterators after the string is modified
  • dereferencing possibly invalid iterators even before the string is modified (if the string is empty, for example)
  • a bug if the expressionBuilder string contains only a single '*' character

Now, the last two items might not really be a problem if the code that uses the snippet/routine is already validating that the string has at least 2 characters, but in case that's not the situation, I believe the following to be more robust in the face of arbitrary values for expressionBuilder:

// using the reverse iterator rbegin() is a nice easy way 
//     to get the last character of a string

if ( (expressionBuilder.size() >= 2) &&
    (*expressionBuilder.begin()  == '*') &&
    (*expressionBuilder.rbegin() == '*') ) {

    expressionBuilder.erase(expressionBuilder.begin());

    // can't nicely use rbegin() here because erase() wont take a reverse
    //  iterator, and converting reverse iterators to regular iterators
    //  results in rather ugly, non-intuitive code
    expressionBuilder.erase(expressionBuilder.end() - 1); // note - not invalid since we're getting it anew

    expressionBuilder = "\\b\\w*" + expressionBuilder + "\\w*\\b";
}

Note that this code will work when expressionBuilder is "", "*", or "**" in that it does not perform any undefined actions. However, it might not produce the results you want in those cases (that's because I don't know exactly what you do want in those cases). Modify to suit your needs.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Thanks. I pretty much know at this point that the string is not empty or "*", but I agree that it would be better to code it that way just in case something changes later. – jeffm Oct 23 '08 at 21:55
3

Try erasing them in the opposite order:

expressionBuilder.erase(iter_end);
expressionBuilder.erase(iter_begin);

After erasing the first *, iter_end refers to one character past the end of the string in your example. The STL documentation indicates that iterators are invalidated by erase(), so technically my example is wrong too but I believe it will work in practice.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • Luckily with strings you don't need to use iterators, most functions have a form that takes an index instead. Still, like you say, even with indexed erasure it should still be done "back to front". – C. K. Young Oct 23 '08 at 19:50
  • P4tXrx5jrMlbhyludk9pxHBT30kGHo9n: you're right about end(), but there is an iter_end-- in there that looks at the actual last character of the string. – Greg Hewgill Oct 23 '08 at 19:54
  • This makes perfect sense, and reversing the order did solve the problem. Thanks! – jeffm Oct 23 '08 at 19:56
  • Sorry, I missed the line with "--". – Max Lybbert Oct 23 '08 at 20:04
  • @Greg: some other post concluded that iterators _after_ the erased one are invalidated. (http://stackoverflow.com/questions/62340/is-a-popback-in-a-stdvector-really-invalidates-all-iterators-on-the-vector) – xtofl Oct 23 '08 at 20:47
  • @xtoff - this doesn't apply to string. In order to allow support for reference counted strings, calling any non-const member function (except a few that do nothing but return references or iterators) may invalidate any existing iterators. – Michael Burr Oct 23 '08 at 21:02
1

(Revised, as I missed the iter_end-- line).

You probably want an if statement that only checks if *iter_begin == '*', and then calls find() to get the other '*'. Or you could use rbegin() to get the "beginning iterator of of the sequence in reverse," advance it one and then call base() to turn it to a regular iterator. That will get you the last character in the sequence.


Even better, std::string has rfind() and find_last_of() methods. They will get you the last '*'. You can also simply call replace() instead of stripping out the '*'s and then adding the new stuff back in.

Max Lybbert
  • 19,717
  • 4
  • 46
  • 69
  • Notice there is an iter_end-- in there that backs up one character. – Greg Hewgill Oct 23 '08 at 19:52
  • Did you miss the "iter_end--;" line, which moves the iterator back to the last item? I'm sure Greg's answer is right, because string iterators are basically just indexes, so the end index is invalidated by the first erase. – Roddy Oct 23 '08 at 19:53
  • I was trying to avoid "find_last_of" because I already know where the character is, but maybe I was overthinking it. – jeffm Oct 23 '08 at 19:57
0

Minus the error handling, you could probably just do it like this:

#include <iostream>
#include <string>
using namespace std;

string stripStar(const string& s) {
    return string(s.begin() + 1, s.end() - 1);
}

int main() {
   cout << stripStar("*word*") << "\n";
}
Shadow2531
  • 11,980
  • 5
  • 35
  • 48
  • What if you call `stripStar("word")` or even `stripStar("word*")`? I think OP wants this versatility. – Cosine May 23 '14 at 00:19