-1

So I'm working with c# win form program and I need it to use Regex.Match method in order to display a certain thing writen on the page.

HTML Of Website

<pre id="code" class="brush: text; plain-text">1</pre>

What i've tried

if (WebBrowserReadyState.Complete == webBrowser1.ReadyState)
        {
            if (webBrowser1.DocumentText.Contains("brush: text; plain-text"))
            {
                Match match1 = Regex.Match("class=\"brush: text; plain-text\">(.*?)<", webBrowser1.DocumentText.Replace("\r", "").Replace("\n", ""));
                if (match1.Success)
                {
                    String pointsStr = match1.Result("$1").ToString();
                    label7.Text = pointsStr;
                }
            }
        }

Link to HTML PAGE : https://www.dropbox.com/s/6te2udjz14tutpt/Verison.txt?dl=0

Basically i need it to display 1 in Label7.Text after it is completely loaded the webpage.

Programerszz
  • 391
  • 1
  • 5
  • 15
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags Read this ! – mybirthname Dec 03 '14 at 01:05
  • All do respect, and i've done method like this before that worked, i don't think that guy is completely sane. – Programerszz Dec 03 '14 at 01:09
  • 2
    **Don't do this**. Instead, use the HTML Agility Pack. – SLaks Dec 03 '14 at 01:10
  • Explain please @SLaks – Programerszz Dec 03 '14 at 01:14
  • 1
    What exactly is failing with the code you have? I.e., what output do you actually get, or what exception? – Nathan Tuggy Dec 03 '14 at 01:18
  • I think what SLaks means is that HTML is not a well-defined, rigorous language. Writing a regex that can actually be considered reliable in parsing it is extremely difficult or impossible, depending on what you're trying to do. If you know for sure that the HTML will always look exactly as you expect, then you can get away with regex. Otherwise, you should be using a library that is written specifically to parse HTML, and the most commonly used one around is HTML Agility Pack. – Peter Duniho Dec 03 '14 at 01:20
  • It doesn't do anything either it isn't finding the brush;..... or it isn't working with loading all the way idk – Programerszz Dec 03 '14 at 01:20
  • Also it says the same so i am one of those lucky people... – Programerszz Dec 03 '14 at 01:38
  • Can someone care to show me how i would use HTML Agility Pack to find that? – Programerszz Dec 03 '14 at 01:53

3 Answers3

0

You can give Regex groups proper names, and then refer to them by name. For example I named the element content as desired. Then use Math.Groups[groupName].Value to get the matched value, as in:

Match match1 = Regex.Match("class=\"brush: text; plain-text\">(?<desired>.*?)<", webBrowser1.DocumentText.Replace("\r", "").Replace("\n", ""));
if (match1.Success)
{
    String pointsStr = match1.Groups["desired"].Value;
    label7.Text = pointsStr;
}

Also it's a good idea to escape angle brackets, and put your pattern inside an @ quoted string, although it seems that the above works fine:

@"class=\""brush: text; plain-text\""\>(?<desired>.*?)\<"

And yes, as you've seen in the comments, use Regex only for Regular languages. HTML is not a regular language, so you'd better use other proper tools such as HTML agility pack for this purpose.

Sina Iravanian
  • 16,011
  • 4
  • 34
  • 45
0

One way to get the text inside the dropbox file is to change the 'www.dropbox.com' to 'dl.dropboxusercontent.com' and download that. So what I did was this:

var wc = new WebClient {Proxy = null};
var url = "https://www.dropbox.com/s/6te2udjz14tutpt/Verison.txt?dl=0"
    .Replace("www.dropbox.com", "dl.dropboxusercontent.com");
Label7.Text = await wc.DownloadStringTaskAsync(url);
Dom Stepek
  • 291
  • 2
  • 11
  • Got Error sayingError 3 The 'await' operator can only be used within an async method. Consider marking this method with the 'async' modifier and changing its return type to 'Task'. – Programerszz Dec 03 '14 at 04:03
  • In your method just add the 'async' modifier. So if your current method looks like this `public void GetVersion()` change it to this `public async void GetVersion()` – Dom Stepek Dec 03 '14 at 04:21
  • Also since the download is so small you can remove the await keyword and change 'DownloadStringTaskAsync' to just 'DownloadString'@Programerszz – Dom Stepek Dec 03 '14 at 14:48
0

A simpler way to accomplish this is almost certainly to replace the regex work with direct element access like this (untested):

if (WebBrowserReadyState.Complete == webBrowser1.ReadyState) {
  var elemCode = webBrowser1.Document.GetElementById("code");
  if (null != elemCode) {
    label7.Text = elemCode.InnerText;
  }
}

That's probably faster and also substantially more robust.

Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
  • Actually it worked but it didn't show anything for label7.text it changed the text but it changed it to nothing. – Programerszz Dec 03 '14 at 04:05
  • Check to see when you're running this code; initially my test tried using the Navigated event for some reason, but DocumentCompleted works a lot better. (You probably don't need the ReadyState test anymore either.) – Nathan Tuggy Dec 03 '14 at 09:13