-8

I want to create a words map in the paragraph. For example; my string is "go bread John yesterday going is music musics...". the words map in that string: music=2, go=2, bread=1, John=1, yesterday=1. Word suffixes will be considered as a common root.(example musics=music). How about this topic I do coding in C#?

emre_ceylan
  • 1
  • 1
  • 3
  • 3
    What about `"Go, Google!"`? Should that also return `Go=2`? – Nolonar Apr 19 '13 at 14:35
  • you'll need a list of valid words for this task. then you can use regex to loop through your word array and count the matches. – Yami Apr 19 '13 at 14:35
  • 1
    what effort have you made? – Daniel A. White Apr 19 '13 at 14:35
  • (1) "music" and "musics" have a common prefix, not a common suffix. (2) You should try something and see if you can do it yourself, then post some code to see if someone would help you fix a problem in your code. – Sergey Kalinichenko Apr 19 '13 at 14:36
  • 1
    What part of this are you having trouble with? Reading the paragraph? Parsing the words? Keeping track of the words? Stemming? Do you have any idea how you're going to approach the problem? – Jim Mischel Apr 19 '13 at 14:39
  • http://mattgemmell.com/2008/12/08/what-have-you-tried/ – EvilBob22 Apr 19 '13 at 16:01

3 Answers3

0

In regards to the suffix, this just looks for an s, you can modify to look for other suffixes.

string words = "go bread John yesterday going is music musics";
List<string> wordroots = words.Split(new [] {" "}, StringSplitOptions.RemoveEmptyEntries).ToList();
var rootcount = wordroots
    .Select(wr =>
    {
        if (wr.EndsWith("s"))
            wr = wr.Substring(0, wr.Length - 1);
        return wr;
    })
    .GroupBy(g => g);

foreach (var group in rootcount)
    Console.WriteLine(string.Format("Found word: {0} {1} times.", group.Key, group.Count()));   
Joey Gennari
  • 2,361
  • 17
  • 26
  • Thank you Joey. Partially solved the problem in your code, such as. But because it must be go of the time 2 because going derive from go. How it can work? – emre_ceylan Apr 19 '13 at 17:43
  • Joey, I figured out the problem. Your code is right. Other shots just by adding code to the common root of the words may be interpreted as attached. For example, if (wr.EndsWith("ing")) wr = wr.Substring(0, wr.Length - 3); – emre_ceylan Apr 19 '13 at 19:13
0

You can first transform all plural forms to single (or plural) forms, you decide, but be consistent, such that music=musics. This is not hard as there is C# code to get the plural form, see for example this post.

You can then create a dictionary:

Dictionary<string, int> data=new Dictionary<string, int>();
foreach(string item in YourInputs)
{
    if(data.ContainsKey(item)
         data[item]++;
    else
         data.Add(item, 1);
}
David
  • 15,894
  • 22
  • 55
  • 66
0

You'll first need a word stem library. Snowball suggested in this SO topic seems like a good place to start.

Even with a stemmer, you'll undoubtedly get a pretty massive list of words from a small article, so your best bet to keep track of all these will probably be an SQL Database. However if you only temporarily need to keep track of these values, a simple string table will probably do the trick.

Community
  • 1
  • 1
Steven Mills
  • 2,363
  • 26
  • 36