0

I am making something like, the user will input any url and the text will be obtained.

The text will then be parsed and the words will be counted.

I am currently reading this article from microsoft: https://msdn.microsoft.com/en-us/library/bb546166.aspx

I can now get the text and i am currently trying to think of an efficient way to count every words.

The article example required a search data but i need to search every word and not a specific word.

Here is what i am thinking:

  1. get the text and convert it to string
  2. split them (delimiters) and store in array
  3. loop through the array then check every occurrences of it.

would this be efficient?

Carlos Miguel Colanta
  • 2,685
  • 3
  • 31
  • 49

1 Answers1

1

Using Linq

If you have a small amount of data can just do a split on spaces, and create a group

 var theString = MethodToGetStringFromUrl(urlString);

 var wordCount = theString
                    .Split(' ')
                    .GroupBy(a=>a)
                    .Select(a=>new { word = a.Key , Count = a.Count() });

see fiddle for more a working copy

Some Experiments and Results

Messed around in .net fiddle a little bit and using Regexs actually decreased the performance and increased the amount of memory used see here to see what I am talking about

Other alternative

Because you are getting the request from a Url it might be more performant to search inside of the stream before converting it to a string and then performing the search

Don't optimize unless you need to Why do you need to find a performant way to do this count? Have you run into any issues or just think you will, a good rule of thumb is generally not to prematurely optimize, for more information check out this good question on the topic : When is optimisation premature?

Community
  • 1
  • 1
konkked
  • 3,161
  • 14
  • 19