There are multiple ways of achieving what you are trying to do, by continuing on what you already have, in order to make it work, you will have: (quick note, it will be concepts or pseudocode, haven't used C++ in a quite a few years)
- Quick and dirty method:
When you are trying to match a word, like you stated if the word contains he
, it will be replaced, thus: the
becomes the or she
.
To solve this, you need to take into account what does ussually
(more on this later) come before and after a word. Usually it is a white space. That means a quick fix would be to replace " he " instead of "he".
So a sentance like The something he something
will indeed give us The something he or she something
.
But like stated by others, this will cause issues when the sentence starts with the thing you are trying to replace. That's why you will want to add a space before and after
your initial sentence.
Assuming "He is something he" as our sentance, this will become " He is something he ", allowing the replacement to work. Then trimming the string in the end will get rid of the extra spaces.
So you will have:
searchStr = " " + searchStr + " ";
myReplace(searchStr, " he ", " he or she ");
trim(searchStr)
- Making a list (vector) of words and then replacing those
Firstly we started by assuming that a word is defined by something between two white spaces
which is inherently wrong for multiple reasons:
- The first/last word of a sentance will not be starting/ending with a space.
- The final word might finish in a punctuation mark, like
.
or !
, which will not work in the previous example
- Punctuation marks inside the string:
he, him and her
will not work
- Special signs like
he/her
will again not work.
What we would want to do in a situation like this would be to split the words by using a regular expression (Regex in C++) containing the possible special characters that might divide words. Here, there are quite a few possiblities of what you might want to do.
- You might want to delimit a word by splitting on all special characters (depending on how you use it, you might end up losing Chinese characters or etc)
- You might want to create a list of things to split on:
,: ;_.!?/~'"
and so on.
So after doing something like this (pseudo):
ourString = "He, is mean to the teacher!"
delimiter = "[ ,.!?]".toRegex //whitespace and some punctuation marks
list = split(ourString, delimiter)
The list will be: [He, is, mean, to, the, teacher] (notice, that we will lose the punctuation marks, more on this later)
Now we can simply traverse the list, replacing each element with what we need and the concatenate it back:
string = ""
for(word in list)
string+= if(word.toLowerCase == "he") " he or she " else " " word " "
Now we will have " He or she is mean to the teacher "
(again, punctuation marks are lost)
What if we want to keep the punctuation marks?
If we want to use the same approach, instead of simply splitting on the punctuation marks themselfs, we can use a more complex regular expression (an example in python). Another alternative to a comple regex would be to:
- First travers the string and add spaces before and after punctuations
- Split it into a list by splitting on just white spaces
- Replacement process
- Put the string back together
string = "He, is !mean."
regex = "[,!.:;]"
string = replace(string, regex with " it ")
//the string is now: "He , is ! mean . "
// something to get rid of multiple spaces and make them into a single one
normliseWhiteSpaces(string)
delimiter = " "
list = split(string, delimiter) //the list is now [he, ,, is, !, mean, .]
string = ""
for(word in list)
string+= if(word.toLowerCase == "he") " he or she " else " " word " "
//the string is now "He or she , is mean . " so we need to:
normliseWhiteSpaces(string)
trim(string)
- Something else entirely depending on what are your actual goals, what are you expecting as your source data, and so on.
- But I don't want regex ... (well then Read the duplicate comment)