2

I'm trying to read out my text file line by line

FILE *infile;
char line[1000];
infile = fopen("file.txt","r");
while(fgets(line,1000,infile) != NULL) 
{
    //....
}
fclose(infile);

And then I need to find a specific word, for example "the", and need to see how many time it occurs and what lines it also occurs on.

I should be able to count the words with this

int wordTimes = 0;
if((strcmp("the", currentWord) == 0)) 
{
    printf("'%s' appears in line %d  which is: \n%s\n\n", "the", line_num, line);
    wordTimes++;
}

where line is the line of text that the string occurs on and line_num is the line number that it occurs on.

And then the amount of times the word is shown uses this code:

if(wordTimes > 0)
{
    printf("'%s' appears %d times\n", "the", wordTimes);
}
else
{
    printf("'%s' does not appear\n", "the");
}

The problem is that I'm not sure how to compare each word in the line to "the" and still print out the line it applies on.

I have to use very basic C for this, so that means I can't use strtok() or strstr(). I can only use strlen() and strcmp().

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
jLynx
  • 1,111
  • 3
  • 20
  • 36
  • Strip off the newline character from `line` and use `strtok` with `" "` as delimeter. This will give you each word. You know how to do the rest :-) – Spikatrix May 23 '15 at 04:00
  • @CoolGuy I forgot to mention that i'm not allowed to use strtok – jLynx May 23 '15 at 04:03
  • 3
    Can you write your own simulations of `strstr()` and/or `strtok()`? What functions are you allowed to use — `strcmp()` seems to be OK, but what else? – Jonathan Leffler May 23 '15 at 04:23
  • @JonathanLeffler basicly strlen gets strcpy strcmp sscanf scanf printf fprintf getchar fopen fclose fscanf fgets – jLynx May 23 '15 at 04:30
  • Please remove `gets()` from that list; it should never be used! See [Why is the `gets()` function so dangerous that it should not be used?](http://stackoverflow.com/questions/1694036/) – Jonathan Leffler May 23 '15 at 04:33
  • @JonathanLeffler I'm not necessarily using it, i'm just listing the the things we are allowed to use – jLynx May 23 '15 at 04:35
  • I'm telling you that you should assume that `gets()` crashes your program. Your code uses `fgets()` which is fine. Anyway, the only string functions are `strlen()` and `strcmp()`. You should really put that information into the question. Is there a proscription on simply implementing the functions? Do you need to count the number of times a word appears in a line? And it presumably needs to be an exact word match (so neither "there" nor "blithely" matches "the", even though the three letters are present in them). – Jonathan Leffler May 23 '15 at 04:38
  • @JonathanLeffler Edited the post question now to say that i can only include strlen() and strcmp(). I can do this anyway i want/can, But i need to count each time there is an exact word match to the word "the" for example in the file and then print out what line it is on for each occurrence and how many times it has occurred – jLynx May 23 '15 at 04:42
  • You need to write your own implementation of `strstr`. That's your assignment in a nutshell. – user93353 May 23 '15 at 04:49
  • @user93353 Just out of curiosity, i tried to use strstr, but if i have the word "the" 2 times on one line it only counts it as one time – jLynx May 23 '15 at 04:51
  • 1
    You just have to call it multiple times, starting after the last match each time, until it doesn't find anything. You also have to ensure that there's a non-alpha before and after whatever is found. – Jonathan Leffler May 23 '15 at 04:52
  • @JonathanLeffler it searches line by line, and if strstr finds a match it does wordcount++; but how does it know if there are 2 matches on one line? because it will still only wordcount++; once – jLynx May 23 '15 at 04:54
  • 1
    You modify the code: `char *here = line; while ((word = strstr(here, "the")) != NULL) { wordcount++; here = word + 1; }` except that you also need to check that the word is surrounded by non-alpha characters. – Jonathan Leffler May 23 '15 at 04:56
  • @JonathanLeffler what is word? is word = "the" or is word = line? – jLynx May 23 '15 at 05:07
  • 1
    After a call to `strstr()`, `word` points to the start of three consecutive letters `the`, which may or may not be surrounded by white space, or it is null. – Jonathan Leffler May 23 '15 at 05:08
  • @JonathanLeffler I have got it working, not I just need to make my own strstr function, Thank you – jLynx May 23 '15 at 05:17
  • 1
    @DarkN3ss - you have to call it in a loop and pass `return value of strstr + strlen("the")` as input each time till `strstr` returns 0. – user93353 May 23 '15 at 10:17

1 Answers1

4

Maybe you need to write a strword() function like this. I'm assuming you can use the classification functions (macros) from <ctype.h>, but there are workarounds if that isn't allowed either.

#include <assert.h>
#include <ctype.h>
#include <stdio.h>

char *strword(char *haystack, char *needle);

char *strword(char *haystack, char *needle)
{
    char *pos = haystack;
    char old_ch = ' ';
    while (*pos != '\0')
    {
        if (!isalpha(old_ch) && *pos == *needle)
        {
            char *txt = pos + 1;
            char *str = needle + 1;
            while (*txt == *str)
            {
                if (*str == '\0')
                    return pos;     // Exact match at end of haystack
                txt++, str++;
            }
            if (*str == '\0' && !isalpha(*txt))
                return pos;
        }
        old_ch = *pos++;
    }
    return 0;
}

int main(void)
{
    /*
    ** Note that 'the' appears in the haystack as a prefix to a word,
    ** wholly contained in a word, and at the end of a word - and is not
    ** counted in any of those places. And punctuation is OK.
    */
    char haystack[] =
        "the way to blithely count the occurrences (tithe)"
        " of 'the' in their line is the";
    char needle[] = "the";

    char *curpos = haystack;
    char *word;
    int count = 0;
    while ((word = strword(curpos, needle)) != 0)
    {
        count++;
        printf("Found <%s> at [%.20s]\n", needle, word);
        curpos = word + 1;
    }

    printf("Found %d occurrences of <%s> in [%s]\n", count, needle, haystack);

    assert(strword("the", "the") != 0);
    assert(strword("th", "the") == 0);
    assert(strword("t", "t") != 0);
    assert(strword("", "t") == 0);
    assert(strword("if t fi", "t") != 0);
    assert(strword("if t fi", "") == 0);
    return 0;
}

When run, this produces:

Found <the> at [the way to blithely ]
Found <the> at [the occurrences (tit]
Found <the> at [the' in their line i]
Found <the> at [the]
Found 4 occurrences of <the> in [the way to blithely count the occurrences (tithe) of 'the' in their line is the]

Is there a way to do the strword function without <ctype.h>?

Yes. I said as much in the opening paragraph. Since the only function/macro used is isalpha(), you can make some assumptions (that you're not on a system using EBCDIC) so that the Latin alphabet is contiguous, and you can use this is_alpha() in place of isalpha() — and omit <ctype.h> from the list of included headers:

static inline int is_alpha(int c)
{
    return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Is there a way to do the strword function without ctype.h? – jLynx May 23 '15 at 05:50
  • I don't understand the need for the is_alpha calls? – user93353 May 23 '15 at 10:23
  • @user93353: try the code without them. It will then blithely pick up 'the' at the start or in the middle or at the end of other words. You would not wish to see the program claim that 'other' was the word 'the', would you? (Separately, there'd be a lot of changes to make to find a word containing a word fragment. But that's a consequence of trying to answer a different question.) – Jonathan Leffler May 23 '15 at 14:02
  • @JonathanLeffler - I got that, but I didn't realise that's what you were aiming for - because strstr doesn't do that. I didn't realise OP wanted to do that. – user93353 May 23 '15 at 15:24
  • @user93353: It's a question of reading between the lines rather than truly being spelled out explicitly in the question. The question repeatedly mentions 'words', rather than something like 'string'. The question only mentions `strstr()` negatively; it can't be used. To some extent, the question is open to interpretation, but it appears that my interpretation is not too far off. – Jonathan Leffler May 23 '15 at 16:37