0

I'm having to use the webbrowser control in a small application to not fill in field values but the opposite extract them, what i am trying to do is grab the complete input string for example:

<input type="text" name="username" class="form-control" size="40" required="required"/>

I know by using:

        foreach (HtmlElement element in webBrowser.Document.GetElementsByTagName("input"))
        {
            Helpers.ReturnMessage(element.GetAttribute("name"));
        }

We can get the value of the name="username" part by using the code above, but is there a way to get the entire string which in this case would be:

<input type="text" name="username" class="form-control" size="40" required="required"/>

Ideally what i am looking to do is grab this part from each input -> name="username" it could be id="value" in some examples so i couldn't hard code it, or would i need to use regex of some kind? thank you for any help.

Eugene Podskal
  • 10,270
  • 5
  • 31
  • 53
tess
  • 29
  • 5
  • I guess you can use `.ToString()` on `webBrowser.Document.` to get raw html and then parse it with regular expression. BTW i think better to use Selenium, because it provides more options in API – nonForgivingJesus Aug 03 '19 at 18:14

1 Answers1

0

It seems that HtmlElement doesn't provide any capabilities to enumerate attributes(at least in a generic enough way) so the simplest solution will be to use its OuterHtml property and parse it with https://html-agility-pack.net/

var inputHtml = _webBrowser
    .Document
    .GetElementsByTagName("input")
    .Cast<HtmlElement>()
    .Single()
    .OuterHtml;     
var elementHtmlDoc = new HtmlAgilityPack.HtmlDocument();
elementHtmlDoc.LoadHtml(inputHtml);
var attributesDictionary = elementHtmlDoc
    .DocumentNode
    .ChildNodes
    .Single()
    .Attributes
    .ToDictionary(
        attr => attr.Name, 
        attr => attr.Value);
MessageBox.Show(
    String.Join(Environment.NewLine, attributesDictionary),
    "Attributes");

If your really need to get attributes HTML string for that element, then it can be (not ideal but still mostly reliable in this case) done with a bit of regular expressions over the OuterHtml of the element

var attributesString = Regex
    .Match(inputHtml, @"^<\s*\S+\s+(?<attributes>[^\>]*)>") // WebBrowser removes closing slash, so we do not need to handle it.
    .Groups["attributes"]
    .ToString();

Though it won't be the actual HTML used(as WebBrowser seems to rearrange original attributes and provides a slightly modified HTML). So if you want to get the actual HTML, then you will have to obtain the original .html file(obviously won't work with SPA and Ajax-heavy sites) separately and parse it with HtmlAgilityPack.

Eugene Podskal
  • 10,270
  • 5
  • 31
  • 53