how to fix counting the most frequent words in a website

Question

I have program that getting url from the user, and searching in the website for the words that most common.

    public void url_input_Click(Object sender, EventArgs e)
{
   string StringFromTheInput = TextBox1.Text;
    var request_ = (HttpWebRequest)WebRequest.Create(StringFromTheInput);
    WebResponse response = request_.GetResponse();
    Stream data = response.GetResponseStream();
    string content = String.Empty;

        using (var client = new WebClient())
        {
            content= client.DownloadString(StringFromTheInput);

        }

    WordCount(content);


}

public static Dictionary<string, int> WordCount(string content, int numWords = int.MaxValue)
{
    var delimiterChars = new char[] { ' ', ',', ':', '\t', '\"', '\r', '{', '}', '[', ']', '=', '/' };
    return content
        .Split(delimiterChars)
        .Where(x => x.Length > 0)
        .Select(x => x.ToLower())
        .GroupBy(x => x)
        .Select(x => new { Word = x.Key, Count = x.Count() })
        .OrderByDescending(x => x.Count)
        .Take(numWords)
        .ToDictionary(x => x.Word, x => x.Count);
}

The issue is that what I have in "content" in the end of the function. why it skip the linq line

score 0 · Accepted Answer · answered Dec 22 '18 at 11:06

0

I have tested the solution and it works ok. I did try the http://google.com URL.

The debugger is not skipping the LINQ line , it executes it and then returns by moving to the end of the method.

I would suggest to add the return result to a variable and add a break point on the last parenthesis.

        var result = WordCount(content); 
    } // put a break point here

answered Dec 22 '18 at 11:06

Alex Leo

2,781
2
13
29

Thanks. the problem now it gives me the html of the website not the content itself – Kate Beck Dec 22 '18 at 19:46
Ok, no problem. I believe the HTML is the content of the website. If my answer has helped please accept it. – Alex Leo Dec 22 '18 at 20:43
I will. I meant that if the url is for article I want the most common words in the article. not the html of the page. do you know how to change it? thanks – Kate Beck Dec 22 '18 at 20:45
That i how you get the content. Check this post it explains how to extract the content. It uses the same approach you did . :https://stackoverflow.com/questions/4510212/how-i-can-get-web-pages-content-and-save-it-into-the-string-variable – Alex Leo Dec 22 '18 at 20:46
What you can do is , once you retrieve the content is extract the part that you are interested in and then execute the method WordCount() – Alex Leo Dec 22 '18 at 20:47
its a problem because how can I now that every content is it the same place – Kate Beck Dec 23 '18 at 00:03
I have provided you with an answer to your other question :https://stackoverflow.com/questions/53900719/copying-text-of-website-using-webbrowser-failed/53902762#53902762 . Hope you appreciate the effort. – Alex Leo Dec 23 '18 at 10:18
really appreciate!! Thank you very much!! – Kate Beck Dec 23 '18 at 14:18
Please remember to accept the answer if they have helped. It show appreciation for someone else effort. Thanks – Alex Leo Dec 23 '18 at 15:11
Sure I will!working about the solution. can I print in `PrintNode` the most common words with my linq expression ? thanks – Kate Beck Dec 23 '18 at 15:21
If the comment is about the other question , please ask it there. – Alex Leo Dec 23 '18 at 18:00
Please remember to accept the answer if they have helped, so we can concentrate on the other question. Thank you – Alex Leo Dec 23 '18 at 20:24

how to fix counting the most frequent words in a website

1 Answers1