One useful function will be the ability to downcase a string. We'll set this aside for now.
void downcase(char *str) {
for (char *ch = str; *ch; ch++) {
if (*ch >= 'A' && *ch <= 'Z') {
*ch = *ch + 32;
}
}
}
Another useful function. We want to determine if a substring is a word, so we need to test the characters before and after it. Obviously, this could be more sophisticated in terms of the characters it looks for, but for educational purposes this will do.
int is_word(char *src, size_t s, size_t len) {
return ((s == 0 || src[s - 1] == ' ' || src[s - 1] == '.' || src[s - 1] == ',') &&
(s + len >= strlen(src) || src[s + len] == ' ' || src[s + len] == '.' || src[s + len] == ',' || src[s + len] == '\0'));
}
Now, let's start a remove_word
function. It'll iterate over the input string, looking at substrings the length of the word we want to replace. It'll use the is_word
function to print a note if the word in question is a word.
This is of course, just one solution to this problem.
void remove_word(char *input, char *to_remove, char* dest) {
int index;
size_t input_len = strlen(input);
size_t to_remove_len = strlen(to_remove);
char temp[to_remove_len + 1];
char temp_ci[to_remove_len + 1];
downcase(to_remove);
for (index = 0; index < input_len - to_remove_len + 1; index++) {
strncpy(temp, input + index, to_remove_len);
strncpy(temp_ci, input + index, to_remove_len);
temp[to_remove_len] = '\0';
temp_ci[to_remove_len] = '\0';
downcase(temp_ci);
printf("%s", temp);
if (is_word(input, index, to_remove_len)) {
printf(" -> Word\n");
}
else {
printf("\n");
}
}
}
Now, if we test this on your test string:
int main() {
char *replace = "the";
char dest[100];
remove_word("The Dhillon Theatre is now Fun Republic", replace, dest);
}
We get this output:
The -> Word
he
e D
Dh
Dhi
hil
ill
llo
lon
on
n T
Th
The
hea
eat
atr
tre
re
e i
is
is
s n
no
now -> Word
ow
w F
Fu
Fun -> Word
un
n R
Re
Rep
epu
pub
ubl
bli
lic
This is super important to accomplishing your goal. We can now identify words that are the right length to be the word we're looking for.
The word to remove has been downcased, and we have a downcased version of the current substring. It's pretty straightforward to find out if the current word should be removed.
void remove_word(char *input, char *to_remove, char* dest) {
int index;
size_t input_len = strlen(input);
size_t to_remove_len = strlen(to_remove);
char temp[to_remove_len + 1];
char temp_ci[to_remove_len + 1];
downcase(to_remove);
for (index = 0; index < input_len - to_remove_len + 1; index++) {
strncpy(temp, input + index, to_remove_len);
strncpy(temp_ci, input + index, to_remove_len);
temp[to_remove_len] = '\0';
temp_ci[to_remove_len] = '\0';
downcase(temp_ci);
if (is_word(input, index, to_remove_len)) {
printf("%3d: %s", index, temp);
if (strcmp(to_remove, temp_ci) == 0) {
printf(" -> Bingo!\n");
}
else {
printf(" -> Word\n");
}
}
}
}
Now when we run it:
0: The -> Bingo!
23: now -> Word
27: Fun -> Word
We now know how to find instances of the word to remove, and the index where they start.
Now we simply have to implement the copying (or not copying) into dest
based on this information. For this I've added a write_index
that will keep track of where to insert characters into dest
.
void remove_word(char *input, char *to_remove, char* dest) {
int index, write_index;
size_t input_len = strlen(input);
size_t to_remove_len = strlen(to_remove);
char temp[to_remove_len + 1];
char temp_ci[to_remove_len + 1];
downcase(to_remove);
for (index = 0, write_index = -1; index < input_len - to_remove_len + 1; index++, write_index++) {
strncpy(temp, input + index, to_remove_len);
strncpy(temp_ci, input + index, to_remove_len);
temp[to_remove_len] = '\0';
temp_ci[to_remove_len] = '\0';
downcase(temp_ci);
if (is_word(input, index, to_remove_len) && strcmp(to_remove, temp_ci) == 0) {
index += to_remove_len - 1;
}
else if (index + to_remove_len == input_len) {
for (int i = 0; i < to_remove_len; i++) {
*(dest + write_index++) = input[index++];
}
}
else {
*(dest + write_index) = input[index];
}
}
*(dest + write_index) = '\0';
}
Now running your test:
int main() {
char *replace = "the";
char dest[100] = {0};
remove_word("The Dhillon Theatre is now Fun Republic", replace, dest);
printf("%s\n", dest);
}
We get:
$ ./a.out
Dhillon Theatre is now Fun Republic
$