-4

Goal:
Locate the the sentence "From today's featured article" from the website "http://en.wikipedia.org/wiki/Main_Page" using webscape with C# code.

Problem:
You retrieve website's soucecode inside of a string value. I believe that you can locate the sentence "From today's featured article" by looping with substring. I have a feeling that it is a inefficient approach.

Is there a better solution to locate the sentence "From today's featured article" from the string input?

Info:
*I'm using C# code with Visual Studio 2013 community.
*The soucecode does not work properly. On the the first three row are working.

WebClient w = new WebClient();

string s = w.DownloadString("http://en.wikipedia.org/wiki/Main_Page");

string svar = RegexUtil.MatchKey(input);




static class RegexUtil
{
    static Regex _regex = new Regex(@"$ddd$");
    /// <summary>
    /// This returns the key that is matched within the input.
    /// </summary>
    static public string MatchKey(string input)
    {
        //Match match = Regex.Match(input, @"From today's featured article", RegexOptions.IgnoreCase);

        Match match = _regex.Match(input);
        //  Match match = regex.Match("Dot 55 Perls");


        if (match.Success)
        {
            return match.Groups[1].Value;
        }
        else
        {
            return null;
        }
    }
}
leppie
  • 115,091
  • 17
  • 196
  • 297
HelloWorld1
  • 13,688
  • 28
  • 82
  • 145
  • 1
    Don't use regular expression to parse html use [HtmlAgilityPack](http://stackoverflow.com/questions/846994/how-to-use-html-agility-pack) – Bobby Tables Jul 01 '15 at 19:08
  • It is not a homework. you use the regex to validate or match the code that is "Form today's featured article" to the input data that contains alot of data. – HelloWorld1 Jul 01 '15 at 19:09
  • The first three row are WebClient w = new WebClient(); string s = w.DownloadString("http://en.wikipedia.org/wiki/Main_Page"); string svar = RegexUtil.MatchKey(input); – HelloWorld1 Jul 01 '15 at 19:09

1 Answers1

1

If you want to find the occurrence of that string, all you need to do is this:

int pos = html.IndexOf("From today's featured article");

However, you should note that this could find the string within quotes or markup and not only from the visible text.

In order to search only the visible text, you'd need to parse the HTML to remove all tags, and then search the text between.

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466