Efficient way of counting every occurrences of every words from a URL

Question

I am making something like, the user will input any url and the text will be obtained.

The text will then be parsed and the words will be counted.

I am currently reading this article from microsoft: https://msdn.microsoft.com/en-us/library/bb546166.aspx

I can now get the text and i am currently trying to think of an efficient way to count every words.

The article example required a search data but i need to search every word and not a specific word.

Here is what i am thinking:

get the text and convert it to string
split them (delimiters) and store in array
loop through the array then check every occurrences of it.

would this be efficient?

Take a look at RegEx. This can search the entire page in one call. — , Mar 25 '16 at 00:51
Why is *Efficiency* even important? Are you doing this billions of times a second? — Erik Philips, Mar 25 '16 at 04:04

score 1 · Answer 1 · edited May 23 '17 at 11:50

Using Linq

If you have a small amount of data can just do a split on spaces, and create a group

 var theString = MethodToGetStringFromUrl(urlString);

 var wordCount = theString
                    .Split(' ')
                    .GroupBy(a=>a)
                    .Select(a=>new { word = a.Key , Count = a.Count() });

see fiddle for more a working copy

Some Experiments and Results

Messed around in .net fiddle a little bit and using Regexs actually decreased the performance and increased the amount of memory used see here to see what I am talking about

Other alternative

Because you are getting the request from a Url it might be more performant to search inside of the stream before converting it to a string and then performing the search

Don't optimize unless you need to Why do you need to find a performant way to do this count? Have you run into any issues or just think you will, a good rule of thumb is generally not to prematurely optimize, for more information check out this good question on the topic : When is optimisation premature?

Efficient way of counting every occurrences of every words from a URL

1 Answers1