Count word in string

Question

I'm trying to write a function to count occurrences of a particular word in a string. For example: Given string -

"Stop, time to go home. Todo fix me."

Letters "to" appeared three times (twice in different words); however, the word "to" appears only once. What should I do to count only word "to" (if will appear in string more times then count every single one). Any advice? This is the code I was trying and playing around.

int word(char inputLine[]) {
    int word = 0, i = 0, j = 0;

    for (i = 0; inputLine[i] != '\0'; i++) {
            if (inputLine[i] == 't' || inputLine[i] == 'o' || inputLine[i] != ' ') {
                word++;
            }
    }
    return word;
}

Use **strtok** function first to split by white spaces to get words, then compare tokens to the search word. — OctoCode, Sep 10 '16 at 21:19
Note that `To` in `Todo` is not the same as `to` in `Stop` or `to`. Are you after case-insensitive matching of words? Should the string "To be or not to be" count 1 or 2 words 'to'? — Jonathan Leffler, Sep 10 '16 at 21:29
Also, are you sure you want to hard-code the word 'to' into the code? Wouldn't it be more useful to have an interface `int count_word(const char *input, const char *word)` which will look for a given word (character sequence) in the input. — Jonathan Leffler, Sep 10 '16 at 21:38
@JonathanLeffler "To be or not to be" should be 2 words. I was trying to do simple example and than expands to function that counts words typed only from a home row on keyboard. using letters "a,s,d,f,g,h,j,k,l" ex "alaska" is typed from home row but "also" is not from home row because "o" is used — Art, Sep 10 '16 at 21:41
There are a lot of cornercases. I would use regex and a case-insensitive search for `\bto\b`. Though `strtok` splitting by spaces, dots, comas, whatever could work too. — Jean-François Fabre, Sep 10 '16 at 21:46
@JonathanLeffler So I'm trying to count only words typed from home row. — Art, Sep 10 '16 at 21:49
@Jean-FrançoisFabre: What about `tolower()`? Yes, it's locale-dependent, but for ASCII(1), the "C" locale should work just fine. ((1) or EBCDIC, or anything non-extended [e.g. not ISO-8869-* or UTF-8 or similar] that string literals use) — Tim Čas, Sep 10 '16 at 22:07
you can iterate through the text via the library function `strstr()` after iterating through the text to pass each character through the `tolower()` function. That way there is no need to re-invent the wheel and every instance of `to`, `To`, `tO` and `TO` will be counted — user3629249, Sep 11 '16 at 15:22

Imbar M. · Answer 1 · 2016-09-10T21:59:32.003

1

Try this:

int word(char inputLine[]) {
    int word = 0, i = 0;

    // stop before the last char
    for (i = 0; inputLine[i] != '\0' && inputLine[i+1] != '\0'; i++) {

        // is (T or t) and (O or o)
        if ((inputLine[i] == 't' || inputLine[i] == 'T') && (inputLine[i+1] == 'o' || inputLine[i+1] == 'O')) {

            // after the 'to' is not a letter
            if ((inputLine[i+2] < 'a' || inputLine[i+2] > 'z') &&
                (inputLine[i+2] < 'A' || inputLine[i+2] > 'Z')) {

                // before is not a letter (or this is the start of the string)
                if (i == 0 ||
                    ((inputLine[i-1] < 'a' || inputLine[i-1] > 'z') &&
                     (inputLine[i-1] < 'A' || inputLine[i-1] > 'Z'))) {
                        word++;
                }
            }
        }
    }

    return word;
}

edited Sep 10 '16 at 21:59

answered Sep 10 '16 at 21:21

Imbar M.

1,074
1
10
19

Your code would be clearer if you use `isalpha()` from ``. – Jonathan Leffler Sep 10 '16 at 21:41
`inputLine[i+2]` is undefined at the end of the string. Stop condition is `inputLine[i+1] != '\0'` – Jean-François Fabre Sep 10 '16 at 21:45
1

will stop the loop if you don't have 2 char (the current and the next one) so at +2, worst case, you'll hit the \0 – Imbar M. Sep 10 '16 at 21:50
@Jean-FrançoisFabre: what is the issue? The stop condition is correct with both `inputLine[i] != '\0' && inputLine[i+1] != '\0'` to allow for empty input strings. Within the body of the loop, `inputLine[i+2]` may or may not be a null byte, but it is part of the string (because neither `inputLine[i]` nor `inputLine[i+1]` is the end of the string). – Jonathan Leffler Sep 10 '16 at 21:50
you're right, sorry. I would create 2 variables with `tolower` at the start of the loop, which would be clearer. – Jean-François Fabre Sep 10 '16 at 21:54
@ImbarM. When I compiled your code, I got warned about unused variable `j` (can be deleted), and also about missing parentheses around the `A && B` expression after `i == 0 || A && B` in the innermost `if` condition. The code is not wrong — `&&` does bind tighter than `||` — but the extra level of parentheses `i == 0 || (A && B)` avoids any risk of confusion. Of course, `i == 0 || !isalpha((unsigned char)inputLine[i-1])` would also work. The cast is necessary in case your plain `char` type is signed and you have accented characters in your string — they'd be converted to negative numbers. – Jonathan Leffler Sep 10 '16 at 21:55

Taqdeer · Answer 2 · 2016-09-19T19:22:35.530

0

Let's posit these rules:

"to" can be a word only when there is no char before and after it except the space char

If you accept those rules as valid and correct you need to check 4 conditions:

if (str[i]=='t'&& str[i+1]=='o'&& str[i-1]!='a-z'&& str[i+2]!='a-z'){
        word++;
    }

Two more conditions can be included to check for the upper case letters.

edited Sep 19 '16 at 19:22

answered Sep 10 '16 at 21:27

Taqdeer

1
2

3

Consider: "to be or not to be - what is the world coming to?". There's no space before the first 'to', nor after the last 'to', but both should be counted. – Jonathan Leffler Sep 10 '16 at 21:31

score 0 · Answer 3 · answered Sep 10 '16 at 23:20

The simplest way would be to use strtok. But, if you'd like to do it all by hand, the following will work. Although you only wanted the "to" this will work for any search string:

#include <stdio.h>

// word -- get number of string matches
int
word(char *input,char *str)
// input -- input buffer
// str -- string to search for within input
{
    int chr;
    int prev;
    int off;
    int stopflg;
    int wordcnt;

    off = -1;
    stopflg = 0;
    wordcnt = 0;
    prev = 0;

    for (chr = *input++;  ! stopflg;  prev = chr, chr = *input++) {
        // we've hit the end of the buffer
        stopflg = (chr == 0);

        // convert whitespace characters to EOS [similar to what strtok might
        // do]
        switch (chr) {
        case ' ':
        case '\t':
        case '\n':
        case '\r':
            chr = 0;
            break;
        }

        ++off;

        // reset on mismatch
        // NOTE: we _do_ compare EOS chars here
        if (str[off] != chr) {
            off = -1;
            continue;
        }

        // we just matched
        // if we're starting the word we must ensure we're not in the middle
        // of one
        if ((off == 0) && (prev != 0)) {
            off = -1;
            continue;
        }

        // at the end of a word -- got a match
        if (chr == 0) {
            ++wordcnt;
            off = -1;
            continue;
        }
    }

    return wordcnt;
}

void
tryout(int expcnt,char *buf)
{
    int actcnt;

    actcnt = word(buf,"to");
    printf("%d/%d -- '%s'\n",expcnt,actcnt,buf);
}

// main -- main program
int
main(int argc,char **argv)
{
    char *cp;

    --argc;
    ++argv;

    for (;  argc > 0;  --argc, ++argv) {
        cp = *argv;
        if (*cp != '-')
            break;

        switch (cp[1]) {
        default:
            break;
        }
    }

    tryout(1,"to");
    tryout(2,"to to");
    tryout(1," to ");
    tryout(1,"todo to");
    tryout(2,"todo to to");
    tryout(2,"doto to to");
    tryout(1,"doto to doto");
    tryout(0,"doto");

    return 0;
}

score 0 · Answer 4 · edited May 23 '17 at 11:48

If you must use only "basic" C functions the above solutions seems ok, but in the case you want to build a more scalable application (and you want to solve the problem in a smarter way) you can use a library that manipulate regular expressions. You can check this answer: Regular expressions in C: examples?

Regexes has the advantage that you can make the regex case unsensible (That is one of your issues). I usually use pcre because it has the regex style of perl and java. Here it is a very useful example that uses pcre: http://www.mitchr.me/SS/exampleCode/AUPG/pcre_example.c.html

score 0 · Answer 5 · edited Jun 24 '19 at 02:09

public class FindCountOfWordInString {

    public static void main(String[] args) {
        String str = "yhing ghingu jhhtring inghfg ajklingingd me";
        String find = "ing";

        int count = findCountOfWordInString(str, find);
        System.out.println(count);
    }

    private static int findCountOfWordInString(String str, String find) {

        String[] strArr = str.split(" ");
        int count = 0, k = 0;
        for (int i = 0; i < strArr.length; i++) {
            if (strArr[i].contains(find)) {
                String strCheck = strArr[i];
                char[] findCharArr = find.toCharArray();
                for (int j = 0; j < strCheck.length(); j++) {
                    if (strCheck.charAt(j) == findCharArr[k]) {
                        k++;
                        if (k == 3) {
                            count++;
                            k = 0;
                        }
                    } else {
                        k = 0;
                    }
                }
            }
        }
        return count;
    }
}

Count word in string

5 Answers5