Replacing single word in a string with another word

Question

Struggling to find a way to replace the word "he" with "he or she", "his with "his or hers", without replacing "the" with "the or she" like my code does below:

#include <iostream>
#include <string>

using namespace std;

void myReplace(string& str, const string& oldStr, const string& newStr)
{
    if (oldStr.empty())
    {
        return;
    }

    for (size_t pos = 0; (pos = str.find(oldStr, pos)) != string::npos;)
    {
        str.replace(pos, oldStr.length(), newStr);
        pos += newStr.length();
    }
}

int main()
{
    string searchStr;

Beginning:

    cout << "Please enter a sentence (Maximum of 100 characters)\n"
         << "Or type 'exit' to close the program\n";
    getline(cin, searchStr);

    cout << "\nYour input:\n\t" << searchStr;

    myReplace(searchStr, "he", "he or she");
    cout << "\nReplaced Text\n\t" << searchStr << "\n\n";

    goto Beginning;
}

What my program does...

Input: He is the man
Output: He or she is the or she man

What it should do...

Input: He is the man
Output: He or she is the man

Any way anyone could help me with this. And if you're asking... YES, I have searched Google EVERYWHERE. Not a damn thing corresponding to my needs. Thanx in advance

You can't use a simple find-replace like you do, you have to check the context to make sure you match the *whole word*. What separates a word from another? Oh and don't forget that punctuation should not be counted in a "word". — Some programmer dude, Jun 19 '18 at 20:09
you want to replace the following "He " (with the space) --> "He or she " " he " (with both spaces) --> " he or she " — AlexG, Jun 19 '18 at 20:09
Your program will never exit because of that nasty `goto` statement, and it doesn't compile, missing some includes. — Carl, Jun 19 '18 at 20:09
@Someprogrammerdude my lack of english certainly kills me. Tokenizing strings and comparing whole words is indeed better as you pointed out. — AlexG, Jun 19 '18 at 20:11
"he" or "his" matching can be correct, just check for the match starting position to be either the first character of the string or preceeded by a non alphabetic character (or by space depending on your acceptable syntax). Then replace as you intended. Case insensitive. One thing: what about singlet "she"? Shouldn't it get replaced by "he or she" as well, if exists? — Attersson, Jun 19 '18 at 20:16

AlexT · Accepted Answer · 2018-06-19T22:09:04.470

There are multiple ways of achieving what you are trying to do, by continuing on what you already have, in order to make it work, you will have: (quick note, it will be concepts or pseudocode, haven't used C++ in a quite a few years)

Quick and dirty method:

When you are trying to match a word, like you stated if the word contains he, it will be replaced, thus: the becomes the or she.

To solve this, you need to take into account what does ussually (more on this later) come before and after a word. Usually it is a white space. That means a quick fix would be to replace " he " instead of "he". So a sentance like The something he something will indeed give us The something he or she something.

But like stated by others, this will cause issues when the sentence starts with the thing you are trying to replace. That's why you will want to add a space before and after your initial sentence.

Assuming "He is something he" as our sentance, this will become " He is something he ", allowing the replacement to work. Then trimming the string in the end will get rid of the extra spaces. So you will have:

searchStr = " " + searchStr + " ";   
myReplace(searchStr, " he ", " he or she ");
trim(searchStr)

Making a list (vector) of words and then replacing those

Firstly we started by assuming that a word is defined by something between two white spaces which is inherently wrong for multiple reasons:

The first/last word of a sentance will not be starting/ending with a space.
The final word might finish in a punctuation mark, like . or !, which will not work in the previous example
Punctuation marks inside the string: he, him and her will not work
Special signs like he/her will again not work.

What we would want to do in a situation like this would be to split the words by using a regular expression (Regex in C++) containing the possible special characters that might divide words. Here, there are quite a few possiblities of what you might want to do.

You might want to delimit a word by splitting on all special characters (depending on how you use it, you might end up losing Chinese characters or etc)
You might want to create a list of things to split on: ,: ;_.!?/~'" and so on.

So after doing something like this (pseudo):

ourString = "He, is mean to the teacher!"
delimiter = "[ ,.!?]".toRegex //whitespace and some punctuation marks
list = split(ourString, delimiter)

The list will be: [He, is, mean, to, the, teacher] (notice, that we will lose the punctuation marks, more on this later)

Now we can simply traverse the list, replacing each element with what we need and the concatenate it back:

string = ""
for(word in list)
   string+= if(word.toLowerCase == "he") " he or she " else " " word " "

Now we will have " He or she is mean to the teacher " (again, punctuation marks are lost)

What if we want to keep the punctuation marks?

If we want to use the same approach, instead of simply splitting on the punctuation marks themselfs, we can use a more complex regular expression (an example in python). Another alternative to a comple regex would be to:

First travers the string and add spaces before and after punctuations
Split it into a list by splitting on just white spaces
Replacement process
Put the string back together

string = "He, is !mean."
regex = "[,!.:;]"
string = replace(string, regex with " it ") 
//the string is now: "He ,  is  ! mean . " 
// something to get rid of multiple spaces and make them into a single one
normliseWhiteSpaces(string) 
delimiter = " " 
list = split(string, delimiter) //the list is now [he, ,, is, !, mean, .]
string = ""
for(word in list)
    string+= if(word.toLowerCase == "he") " he or she " else " " word " "
//the string is now "He or she , is mean . " so we need to: 
normliseWhiteSpaces(string)
trim(string)

Something else entirely depending on what are your actual goals, what are you expecting as your source data, and so on.
But I don't want regex ... (well then Read the duplicate comment)

I stated: "Add a space on both sides of your string". That will fix the problem of a starting or ending "he", and after that, the trimming will get rid of them. I'm on my phone, on a bus, I'll explain it when I get home, and add some alternatives. — AlexT, Jun 19 '18 at 20:18
Thanx for the answers. I will try the trimming method now and report back — DogFoxX, Jun 19 '18 at 20:44
When does the trimming take place? Before the find and replace, or after? — DogFoxX, Jun 19 '18 at 20:53
It does work, but the problem about the sentence starting with "he" is there — DogFoxX, Jun 19 '18 at 20:54
I have updated the answer and that concept should work. I am editing it now to explain some alternatives. — AlexT, Jun 19 '18 at 21:13

Replacing single word in a string with another word

1 Answers1