3

I have string say "walk talk, can't won't Won't woN'T talk." I want to count the reapeated words and display. Note: it is not case sensitive.

I have used delimeter

strtok(string, ",.;:\"!? -_\n\t*()@#=+");

and saved it in

char *temp[100];

Now how can I check for repeatation of words? And display as below

3 won't
2 talk
1 can't
1 walk

it should display from highest repeat to lowest. And if the repeatation is same then display alphabetic order.

Sorry for my bad english.

Marlon
  • 19,924
  • 12
  • 70
  • 101
user1579911
  • 99
  • 1
  • 5
  • 2
    This question has already been answered: http://stackoverflow.com/questions/3672234/c-function-to-count-all-the-words-in-a-string?rq=1 – Mohamed Nuur Aug 06 '12 at 18:34
  • 1
    use a `std::vector`, then we're talking C++ – Tony The Lion Aug 06 '12 at 18:34
  • 1
    I want count of repeat of words not total number of words. – user1579911 Aug 06 '12 at 18:45
  • You need to make sure what you mean by "word". It is incredibly tricky to define in a meaningful way and to have results that are actually worth interpreting. – pmr Aug 06 '12 at 18:57
  • 2
    @pmr: OP has made terribly clear what is meant by "word" by specifying all delimiting characters in the call to `strtok`. – bitmask Aug 06 '12 at 19:02
  • @bitmask And I tried to make it terribly clear that this is utterly useless in most contexts. – pmr Aug 06 '12 at 19:54
  • @pmr: You claim that regularly characterising what a word is would be useless? Have you ever used Vim? – bitmask Aug 06 '12 at 19:56
  • @bitmask I have never said "all". If that view of what a word is useful to you (and it is if you are editing text) that's fine. It is not useful if you want to reason about word probability or anything else that is contained in text. I'm not going to argue too much, this is not the place. There are vast resources about the definition of a `word`. Please help yourself, you won't miss it. – pmr Aug 06 '12 at 20:03
  • @TonyTheLion `std::vector` is not the correct container to use here. I suggest using a `std::map` instead. – Marlon Aug 06 '12 at 21:05

4 Answers4

3

Use a std::string to hold the result of the strtok(). Then create a std::map<string, int> to hold the count of the times the string (the key) has occurred.

You can populate the map with:

std::map<string, int> myMap;
myMap[tokenizedWord]++; //Increase count of word.

You can then cycle through the map content and print out wherever the integer value is greater than 2.

for (std::map<string, int>::iterator iter = myMap.begin(); iter != myMap.end(); ++iter)
{
    if (iter->second > 1)
        std::cout << "Duplicated word: " << iter->first << " count = " << iter->second;
}

I'll let you figure out how to traverse it in order. You can put the values in a vector or something and use std::sort before printing or whatever else you like. Maps, unfortunately, are associative containers and you can't sort them as it breaks their internal ordering.

Background Info on std::map

A map is an associative array meaning that every key maps to a specific value, and keys are unique. You can actually create a multimap where keys are not unique, so that's why this is important.

Basically, since keys are unique, you can access or create an element just by using the key as the array index.

For example:

//Create a map and insert a couple things into it - prices of meat?
std::map<string, float> myMap;
myMap["Chicken"] = 4.99;
myMap["Turkey"] = 6.99;

//Retrieve the price of something using the key.
std::cout << "Chicken costs " << myMap["Chicken"] << std::end;

You can do standard insertion and location operations on a map too, but the associative array syntax is just simpler, so why bother? :)

PS: To fully answer your comment, just in case, the ++ at the end of myMap[tokenizedWord]++ is just saying to increment the value of the integer value stored for that key by 1. You could just as well do myMap[tokenizedWord] = myMap[tokenizedWord] + 1 OR you could also do myMap[tokenizedWord] += 1.

John Humphreys
  • 37,047
  • 37
  • 155
  • 255
  • 2
    Make sure to `std::transform` your `string` with `tolower` before throwing them into the `map`. – bitmask Aug 06 '12 at 19:02
  • what is myMap[tokenizedWord]++; – user1579911 Aug 06 '12 at 19:11
  • @user1579911 Modified by answer to explain what that syntax meant. – John Humphreys Aug 06 '12 at 19:21
  • thanks a lot, I understood tokenizedWord and the iterator iter. But I cant get other words I am not sure what mistake i did. std::string temp = strtok(buffer, ",.;:\"!? -_\n\t*()@#=+"); if i cout tokenizedWord, i get only the first word how to compare it with other words in the string? – user1579911 Aug 06 '12 at 20:15
  • I'm not 100% sure what you're asking. But strtok only does one word at a time and it saves state. So, you need to save the returned string into the map and then call strtok again, repeating the process. You should loop until strtok's return value indicates that nothing was left. – John Humphreys Aug 06 '12 at 21:09
1

a complete implementation of your problem (Let me know if you want a sample code for sorting):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define ARRAY_ELEMS_COUNT(A)    sizeof(A)/sizeof(*A)

typedef struct _word_t
{
        char    *word;
        int     occurr_count;
        struct _word_t  *next;
} word_t;

typedef struct _word_list_t
{
        struct  _word_t *head;
        struct  _word_t *tail;
        int     elems_count;
} word_list_t;

/* Creation of the words list */
word_list_t *make_list(void)
{
        word_list_t *w_list = (word_list_t *)malloc(sizeof (struct _word_list_t));
        if (w_list == NULL)
        {
                fprintf(stderr, "malloc faild --> %s\n", strerror(errno));

                return NULL;
        }
        w_list->head = w_list->tail = NULL;
        w_list->elems_count = 0;

        return w_list;
}

int list_word_lookup(word_list_t *w_list, char *word)
{
        word_t *temp_word = w_list->head;
        while(temp_word)
        {
                if (strcmp(temp_word->word, word) == 0)
                {
                        /* We got it before, increment the count */
                        temp_word->occurr_count++;

                        return 1;
                }
                else
                {
                        temp_word = temp_word->next;
                }
        }

        return 0;
}

/* Adding new words to the list of words if they are not present, otherwise increment their occurrence count */
/* TODO : Sort the list using Merge sort for performance */
int adding_to_list(word_list_t *w_list, char *word)
{
        int     return_status = 0;
        char    *tmp_word = (char *)malloc(sizeof(char)*(strlen(word) + 1));
        word_t  *new_word = (word_t *)malloc(sizeof(struct _word_t));
        /* Empty list */
        if (w_list->head == NULL)
        {
                strcpy(tmp_word, word);
                new_word->word = tmp_word;
                new_word->occurr_count = 1;
                w_list->head = w_list->tail = new_word;
                w_list->head->next = NULL;
                w_list->elems_count++;
        }
        else
        {
                /* The list is not empty */
                /* Checking if the word exist in the list */
                return_status = list_word_lookup(w_list, word);
                if (return_status == 1)
                {
                        fprintf(stdout, "WE got this word before --> increment count\n");
                }
                else
                {
                        strcpy(tmp_word, word);
                        new_word->word = tmp_word;
                        new_word->occurr_count = 1;
                        w_list->tail->next = new_word;
                        w_list->tail = new_word;
                        w_list->tail->next = NULL;
                }
        }

        return 0;
}

void words_list_dump(word_list_t *w_list)
{
        word_t *temp;

        for (temp = w_list->head; temp; temp = temp->next) {
                fprintf(stdout, "Word : %s -- Count = %d\n", temp->word, temp->occurr_count);
        }
}

/* Destroying all words */
void free_words(word_list_t *w_list)
{
        word_t *temp;

        for (temp = w_list->head; temp; temp = temp->next) {
                /* Freeing the word string */
                free(temp->word);
                /* Freeing the word */
                free(temp);
        }
        w_list->head = NULL;
        w_list->tail = NULL;
}

/* Destroying the words list */
void free_words_list(word_list_t *w_list)
{
        if (!w_list)
        {
                return;
        }
        free_words(w_list);
        free(w_list);
}

/* TODO : create a function that converts your input text to a char ** array, so you can pass it to adding_to_list */
/* For testing */
int main(int argc, char **argv)
{
        const char *string[] = {"Hello", "World", "Stackoverflow", "C", "Hello", "C", "WORDS", "words", "List", "list", "Hello", "World", "Count"};
        word_list_t *my_list = make_list();
        int i;

        for (i = 0; i < ARRAY_ELEMS_COUNT(string); i++)
                adding_to_list(my_list, string[i]);
        words_list_dump(my_list);
        free_words_list(my_list);

        return 0;
}
TOC
  • 4,326
  • 18
  • 21
0

Here is an answer using strtok but without std::map. In one pass of string, every word in is checked against previous words and repeats are counted.

#include <iostream>
using std::cin;
using std::cout;
using std::endl;

#include <string>
using std::string;

#include <vector>
using std::vector;

#include <cstring>

using std::tolower;

int main()
{
    char *strin;
    string inputstr;
    vector<string> svec;
    vector<int> cvec;
    char *pch;
    int unique_word_count=0;
    while(getline(cin,inputstr))
    {
        //token-ize the string
        //First string
        strin = &inputstr[0];
        pch = std::strtok(strin," ,-");
        bool unique_word_found = true;
        //subsequent words
        while (pch != NULL)
        {
            string word(pch);
            for(string::size_type i=0; i < word.size(); i++)
                word[i]=tolower(word[i]);
            //first word
            //just add to svec and no comparisons
            if(unique_word_count==0)
            {
                svec.push_back(word);
                cvec.push_back(1);
                cvec[unique_word_count++]=1; //init count of first word
                //next word
                pch = std::strtok(NULL, " ,-");
                unique_word_found = true; //reset flag
                continue;
            }

            //start comparing with other words currently in string vector
            //do not do this if only 1 word present
            vector<string>::iterator iter=svec.begin();
            while(iter < svec.end())
            {
                if(word == *iter)
                {
                    //match found
                    cvec[iter-svec.begin()]++; //increment count of that word
                    unique_word_found = false;
                }
                iter++;
            }
            if(unique_word_found)
            {
                //add to unique word list and increment count
                svec.push_back(word);
                cvec.push_back(1);
                cvec[unique_word_count++]=1;
            }

            //next word
            pch = std::strtok(NULL, " ,-");
            unique_word_found = true; //reset flag
        }
    }

    cout << "Word" << " ---> " << "Occurences" << endl;
    for(vector<string>::size_type i=0; i < svec.size(); i++)
    {
        cout << svec[i] << "  --->  " << cvec[i] << endl;
    }
    return 0;
}
jithin
  • 101
  • 3
0

The general strategy can be as follows:

  • Sanitize the input (convert all characters to lower case, remove unwanted punctuation, etc.)
  • Walk through the input
  • Add each character to a string, finalizing when a space is encountered
  • Add the string to a key-value structure. The string is the key. If this is a new entry not already contained in the structure, set the value 1. Otherwise set it to the current value + 1 (so as to count the number of times encountered so far).
  • Repeat for each word
  • Walk through the key-value structure and print each entry.
Viet Trang
  • 51
  • 5