1

I'm trying to extract variables from an HTML comment... any thoughts on how this can be done??

Example of the comment...

<!-- variable1: "wer2345235" variable2: "sdfgh333" variable3: "sdfsdfdfsdf"  -->

I tried splitting based on spaces, but the variable values might have a space in.

thanks for any help!

[edit] The variables inside the HTML tag are returned as a result of an API call - so it's outside of my control. [/edit]

[edit 2] Could this be done using regex? I've been reading up and I can match the but not much else! [/edit]

Matt Facer
  • 3,103
  • 11
  • 49
  • 91
  • 1
    can you use hidden fields instead? would probably make life easier. or is this a situation where someone else did the comment idea and you are stuck with trying to find a way to make it work? – peroija May 11 '12 at 13:42
  • it is actually generated in response to an HTTP post from my web app. It's an API response which simply confirms submission of data. Something I cannot change unfortunately. – Matt Facer May 11 '12 at 13:43
  • We do this to parse the origin server name and installed application version for an internal deployment. We use regex to parse the html returned via an ajax call. Match the comment. Then pull variables out using groups. – Ken Brittain May 14 '12 at 12:29

3 Answers3

2

You can use a HTML parser to get the comments, ie HtmlAgilityPack

You can refer to this Grabbing meta-tags and comments using HTML Agility Pack

[Edit] Assuming that you get the comments and the format is known, you can strip out the

I did this and it got the variable fields correct

        var str = "variable1: \"wer2345235\" variable2: \"sdfgh333\" variable3: \"sdfsdfdfsdf\"";
        var r = new Regex(@"variable[\d]+: ");
        var result = r.Split(str);
        foreach( var match in result)
        {
            Console.WriteLine(match);
        }

        Console.ReadLine();
Community
  • 1
  • 1
Jason Jong
  • 4,310
  • 2
  • 25
  • 33
0

I'm guessing you want to access via server-side code since you applied the C# tag. Is there a reason why a comment was chosen for these variables?

You could use the <asp:HiddenField /> and use the Value property. It would be trivial to access these values and parse appropriately.

If you absolutely need to have these in a comment. Is the comment contained in some other block with an ID tag? If so, you could grab the InnerHTML of that object and use basic String functions to grab and parse the fields. This assumes there are not multiple comments or no distinct way of locating this particular comment of course.

Matt
  • 2,078
  • 2
  • 27
  • 40
0

Simple regular expressions should be fine for this.

    private Dictionary<string,string> ParseCommentVariables(string contents)
    {
        Dictionary<string,string> variables = new Dictionary<string,string>();

        Regex commentParser = new Regex(@"<!--.+?-->", RegexOptions.Compiled);
        Regex variableParser = new Regex(@"\b(?<name>[^:]+):\s*""(?<value>[^""]+)""", RegexOptions.Compiled);
        var comments = commentParser.Matches(contents);
        foreach (Match comment in comments)
            foreach (Match variable in variableParser.Matches(comment.Value))
                if (!variables.ContainsKey(variable.Groups["name"].Value))
                    variables.Add(variable.Groups["name"].Value, variable.Groups["value"].Value);
        return variables;
    }

Will first extract all the comments from the 'contents' string. Then it will extract all the variables it finds. It stores these in a dictionary and returns it to the caller.

i.e:

string contents = "some other HTML, lalalala <!-- variable1: \"wer2345235\" variable2: \"sdfgh333\" variable3: \"sdfsdfdfsdf\"  --> foobarfoobarfoobar";
var variables = ParseCommentVariables(contents);
string variable1 = variables["variable1"];
string variable2 = variables["variable2"];
Jason Larke
  • 5,289
  • 25
  • 28