Replacing words in a string to asterisks

Question

I need to censor the word darn and chance the word to ****. I think I have the program written correctly to see if the string contains the word darn but I am unsure of the best place to replace the word and how to replace the word with asterisks. Here is the code that I have so far. I appreciate any suggestions and advice!

#include <stdio.h>

int censor(char phrase[], int psize, char curses[], int csize)
{
    int n;
    int i;
    int foul;
    i = 0;
    while(phrase[i] != '\0')
    {
        /** If the first letter matches **/
        if(phrase[i] == curses[0])
        {
            int j;
            j = 0;
            int match;
            match = 1; // match is true
            while(curses[j] != '\0' && match == 1)
            {
                if(curses[j] != phrase[i+j])
                {
                    match = 0; // match is false
                }
            }
            if(curses[j] == '\0')
            {
                if(phrase[i+j] == ' ' || phrase[i+j] == '\0')
                {
                    foul = 1;
                    int k;
                    k = 0;
                    while(k <= j);
                    {
                        phrase[i+k] = '*';
                        k = k + 1;
                    }
                }
            }
        }

        /** Skip to the next word **/

        while(phrase[i] != ' ' && phrase[i] != '\0')
        {
            i = i + 1;
        }
    }
    return foul;
}
int main()
{
    /** Sets curse word **/
    int csize = 4;
    char curse[4] = "darn"; // the curse words
    char str[1000];
    int i = 0;
    int totalwords = 0;

    /** Variables and Function call to read in a phrase should be here**/
    printf("Enter your message here: ");
    scanf("%[^'\n']s",str); //getting the string for analysis
    int strsize = 0;
    for(i = 0; str[i] != '\0'; i++)
    {
        if(str[i] == ' ' || str[i] == '\n' || str[i] == '\t')
        {
            totalwords++;
        }
    }
    totalwords++;
    int foul = censor(str, strsize, curse, csize); // calling the function
    if(foul = 1)
    {
        printf("\nThere was potty language in your phrase. It was censored. See below:\n");
    }
    else
    {
        printf("\nYour sentence was clean. Here is what you entered:\n");
    }
    printf("%s\n", str);
    return 0;
}

what's wrong with your code? please include some some inputs, your expected and actual outputs — drum, Nov 04 '20 at 04:34
You are not using the loop iterators correctly. Dry run the first iteration, when `i` and `j` are `0` `while(curses[j] != '\0' && match == 1)` will never come of `while` loop, because `curses[j] != phrase[i+j]` will be false. what is wrong with using `if(!strncmp(curses,"darn",4)) strncpy(curses,"****",4);` — IrAM, Nov 04 '20 at 05:07
Turn on all compiler warnings. It should probably warn you about the variable `foul` being uninitialized. — paddy, Nov 04 '20 at 05:19
Did you really mean to match everything not a *Single-Quote* and *Newline* in `[^'\n']` (you don't quote `'\n'` within `[..]` and there is no `s` after `%[..]` unless you want to match a literal `'s'`). To be proper you need to include the *field-width* modifier in `scanf("%999[^\n]",str);` Otherwise, you are simply using `gets(str);` and see [Why gets() is so dangerous it should never be used!](https://stackoverflow.com/q/1694036/3422102) — David C. Rankin, Nov 04 '20 at 09:07

David C. Rankin · Answer 1 · 2020-11-04T20:00:00.967

Your approach is going in the right direction, it just becomes jumbled along the way. Since you cannot use strstr(), you simply have to use loops to do it manually. There are a couple of approaches you can take, you can loop until the first-character of curse is found and then scan forward in phrase with another loop to see if you have a match, handle end-checks, using (isalnum() from ctype.h, or if you can't use that, then manually check), and if you have found curse then loop again replacing characters. (you have done your end of word check, but not the beginning of word check)

A similar, but slightly more efficient approach is simply to loop over each character in phrase and keep a state-variable such as a simple int inword; flag that is set to 0 - false if you are before, between or after a curse reading spaces or non-curse characters where curse is a substring of another word, and set to 1 - true when you are within a curse. (there is no additional loop to scan-forward). Then when you reach the end of curse, if you are still inword and the next character is !isalnum(next) you have located a curse to mask, you simply loop curse-number of character times overwriting the previous indexes in phrase with the mask character.

For example, your censor() function can be rewritten as:

#include <stdio.h>
#include <ctype.h>

/* censor all whole-word occurences of w in s replacing with mask character */
char *censor (char *phrase, const char *curse, char mask)
{
    for (int i = 0, j = 0, inword = 0; phrase[i]; i++) {  /* loop over chars in phrase */
        if (phrase[i] == curse[j]) {                      /* char equal to curse? */
            if (inword || !i || !isalnum (phrase[i-1])) { /* inword or 1st/start char */
                inword = 1;                               /* set inword flag - true */
                j++;                                      /* increment word index */
            }
            else    /* word is not whole-word by itself */
                inword = 0;                               /* set inword flag - false */
        }
        if (!curse[j]) {    /* nul-terminator for word reached */
            /* inword and end of phrase or whole word end */
            if (inword && (!phrase[i+1] || !isalnum (phrase[i+1])))
                while (j--)                               /* loop j times */
                    phrase[i-j] = mask;                   /* overwrite chars with mask */
            inword = j = 0;                               /* reset inword, j zero */
        }
    }
    
    return phrase;        /* return censored string as a convenience */
}

(note: There is no need to pass "size" information when you are working with strings. The nul-terminating character marks the end of the string)

Where mask is the mask character to overwrite the bad words in the string with. The loop variables (which you can declare above the loop if you like), are i, j, and inword. Where i is the index for phrase, j is used as the index for curse, and inword is your state-variable tracking whether you are in a word (curse).

The for loop just loops over the characters in phrase. There are two if statements that provide the primary conditions you check:

if the current character is a character in ``curse`; and
if the nul-terminating character in curse was reached.

Within the first if, you check if you are currently inword matching characters of curse, or with your begin-check, you check if the current character is the 1st char in phrase or something other than [A-Za-z0-9] qualifying as the beginning of a whole-word match of curse. (you can adjust the [A-Za-z0-9] criteria to meet whatever condition you have. If either part is true, you set inword = 1; (it doesn't matter that you are setting it each time) and you increment your curse index j. The j index ensures you match characters between phrase and curse in-order.

The second main if (2. above), simply performs the end-check on curse making sure it ends as a whole-word in phrase. If you have found the whole-word curse that you need to mask, you simply loop once for each character in curse overwriting the previous j characters in phrase with mask, resetting both inword and j to zero when done. That's it. You then return a pointer to the modified string as a convenience, so you can make immediate use of the modified string.

A short example using the function that provides a simple phrase and curse to mask by default, but that also allows you to pass the curse and mask character as the 1st and 2nd arguments to the program could be done as:

int main (int argc, char **argv) {

/* string, word to censor and mask character */
char str[] = "My dam dog got 100 dam fleas laying on the damp ground",
     *word = argc > 1 ? argv[1]  : "dam",
      mask = argc > 2 ? *argv[2] : '*';

printf ("maskchar : '%c'\noriginal : %s\n", mask, str); /* output str and mask */
printf ("censored : %s\n", censor (str, word, mask));   /* output censored str */

}

Example Use/Output

$ ./bin/censorword
maskchar : '*'
original : My dam dog got 100 dam fleas laying on the damp ground
censored : My *** dog got 100 *** fleas laying on the damp ground

(Note above that only whole word occurrences of "dam" are masked leaving "damp" unchanged.)

If you wanted to change the curse to "damp" and to use '-' as the mask character, you could do:

$ ./bin/censorword damp '-'
maskchar : '-'
original : My dam dog got 100 dam fleas laying on the damp ground
censored : My dam dog got 100 dam fleas laying on the ---- ground

Experiment writing your program both ways, using multiple loops to scan-forward and to replace, and then write it again using a state-variable as above. Compare the logic and how using the state variable(s) will simplify the logic. State-variables can be used to simplify an endless number of iterative problems.

Look things over and let me know if you have further questions.

This is very helpful! Thank you for taking the time to do this. It is a way better explanation that what my professor gave me! — Sbroberg, Nov 04 '20 at 21:26
Sure, glad it helped. The key in any string manipulation problem (or any problem for that matter) is to *Slow-Down* and think through what has to happen *character-by-character*. This is a good exercise. It forces you to think character-by-character. When I write something like this, I usually write it in 4 or 5 revisions. I get the function working (logic may be longer/different/etc..) and then I look to see if it can be made better by rearranging parts, or approaching it differently. Don't expect to write the perfect function the first time. Get it working, then improve things. Good luck! — David C. Rankin, Nov 04 '20 at 21:33
So I am going with your route about testing if it is a curse or not but when I initialize the curse in main as an array curse[4] = "darn" I also initialized mask[4] = "****", and I then call the function with censor(phrase, curse, mask); when the word darn is typed when the program is running it does not censor it with ****. — Sbroberg, Nov 04 '20 at 22:18
Oh, remember I use strings -- that must be *nul-terminated*, use `curse[] = "darn";` Remember, all strings end with a `'\0'` (*nul-terminating*) character. So to initialize a string containing `"darn"` you would need `char curse[5] = "darn";` minimum. If you omit the size, then the array is initialized to hold the entire string -- including the `'\0'`. Do this to prove it to yourself. `char curse[] = "darn";` then `printf ("curse size : %zu\n", sizeof curse);` — David C. Rankin, Nov 04 '20 at 23:01
@Sbroberg Always check the C-Standard if you are unsure. Relevant here is [C11 Standard - 6.7.9 Initialization(p14)](http://port70.net/~nsz/c/c11/n1570.html#6.7.9p14). `"darn"` is a *string literal* being used to initialize your array `curse`. `"An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array."` — David C. Rankin, Nov 04 '20 at 23:09

Replacing words in a string to asterisks

1 Answers1