Your approach is going in the right direction, it just becomes jumbled along the way. Since you cannot use strstr()
, you simply have to use loops to do it manually. There are a couple of approaches you can take, you can loop until the first-character of curse
is found and then scan forward in phrase
with another loop to see if you have a match, handle end-checks, using (isalnum()
from ctype.h
, or if you can't use that, then manually check), and if you have found curse
then loop again replacing characters. (you have done your end of word check, but not the beginning of word check)
A similar, but slightly more efficient approach is simply to loop over each character in phrase
and keep a state-variable such as a simple int inword;
flag that is set to 0 - false
if you are before, between or after a curse
reading spaces or non-curse
characters where curse
is a substring of another word, and set to 1 - true
when you are within a curse
. (there is no additional loop to scan-forward). Then when you reach the end of curse
, if you are still inword
and the next character is !isalnum(next)
you have located a curse
to mask, you simply loop curse-number of character times overwriting the previous indexes in phrase
with the mask character.
For example, your censor()
function can be rewritten as:
#include <stdio.h>
#include <ctype.h>
/* censor all whole-word occurences of w in s replacing with mask character */
char *censor (char *phrase, const char *curse, char mask)
{
for (int i = 0, j = 0, inword = 0; phrase[i]; i++) { /* loop over chars in phrase */
if (phrase[i] == curse[j]) { /* char equal to curse? */
if (inword || !i || !isalnum (phrase[i-1])) { /* inword or 1st/start char */
inword = 1; /* set inword flag - true */
j++; /* increment word index */
}
else /* word is not whole-word by itself */
inword = 0; /* set inword flag - false */
}
if (!curse[j]) { /* nul-terminator for word reached */
/* inword and end of phrase or whole word end */
if (inword && (!phrase[i+1] || !isalnum (phrase[i+1])))
while (j--) /* loop j times */
phrase[i-j] = mask; /* overwrite chars with mask */
inword = j = 0; /* reset inword, j zero */
}
}
return phrase; /* return censored string as a convenience */
}
(note: There is no need to pass "size" information when you are working with strings. The nul-terminating character marks the end of the string)
Where mask
is the mask character to overwrite the bad words in the string with. The loop variables (which you can declare above the loop if you like), are i
, j
, and inword
. Where i
is the index for phrase
, j
is used as the index for curse
, and inword
is your state-variable tracking whether you are in a word (curse
).
The for
loop just loops over the characters in phrase
. There are two if
statements that provide the primary conditions you check:
- if the current character is a character in ``curse`; and
- if the nul-terminating character in
curse
was reached.
Within the first if
, you check if you are currently inword
matching characters of curse
, or with your begin-check, you check if the current character is the 1st char in phrase
or something other than [A-Za-z0-9]
qualifying as the beginning of a whole-word match of curse
. (you can adjust the [A-Za-z0-9]
criteria to meet whatever condition you have. If either part is true, you set inword = 1
; (it doesn't matter that you are setting it each time) and you increment your curse
index j
. The j
index ensures you match characters between phrase
and curse
in-order.
The second main if
(2. above), simply performs the end-check on curse
making sure it ends as a whole-word in phrase
. If you have found the whole-word curse
that you need to mask, you simply loop once for each character in curse
overwriting the previous j
characters in phrase
with mask
, resetting both inword
and j
to zero when done. That's it. You then return a pointer to the modified string as a convenience, so you can make immediate use of the modified string.
A short example using the function that provides a simple phrase and curse to mask by default, but that also allows you to pass the curse and mask character as the 1st and 2nd arguments to the program could be done as:
int main (int argc, char **argv) {
/* string, word to censor and mask character */
char str[] = "My dam dog got 100 dam fleas laying on the damp ground",
*word = argc > 1 ? argv[1] : "dam",
mask = argc > 2 ? *argv[2] : '*';
printf ("maskchar : '%c'\noriginal : %s\n", mask, str); /* output str and mask */
printf ("censored : %s\n", censor (str, word, mask)); /* output censored str */
}
Example Use/Output
$ ./bin/censorword
maskchar : '*'
original : My dam dog got 100 dam fleas laying on the damp ground
censored : My *** dog got 100 *** fleas laying on the damp ground
(Note above that only whole word occurrences of "dam"
are masked leaving "damp"
unchanged.)
If you wanted to change the curse to "damp"
and to use '-'
as the mask character, you could do:
$ ./bin/censorword damp '-'
maskchar : '-'
original : My dam dog got 100 dam fleas laying on the damp ground
censored : My dam dog got 100 dam fleas laying on the ---- ground
Experiment writing your program both ways, using multiple loops to scan-forward and to replace, and then write it again using a state-variable as above. Compare the logic and how using the state variable(s) will simplify the logic. State-variables can be used to simplify an endless number of iterative problems.
Look things over and let me know if you have further questions.