1

I am looking for some information on how to convert XHTML to a very specific XML. For example, I have following XHTML sample:

<body>
<div id="divParent" class="header" style="width: 250px; height: 200px;">
  <fieldset id="fldScope" style="left: 5px; width: 240px; top: 5px; height: 60px;">
    <label style="left: 5px; top: 5px;">Reason:</label>
    <select id="selReason">
      <option value="">SELECT ONE:</option>
      <option value="TRAINING">TRAINING</option>
      <option value="OTHER">OTHER</option>
    </select>
  </fieldset>
  <fieldset class="bottomSection">
    <button id="btnClose" accessKey="o" class="webbutton" type="button">
      <u>O</u>K</button>
  </fieldset>
</div>
</body>

which I need to transform into something like this:

<control controlId="topLevelDiv" controlType="HtmlDiv" controlSearchProperties="id=divParent;class=header">
    <childControls>
        <control controlId="topLevelFieldset" controlType="HtmlFieldSet" controlSearchProperties="id=fldScope">
          <childControls>
            <control controlId="topLevelLabel" controlType="HtmlLabel" controlSearchProperties="InnerText=Reason:">
                <childControls/>
            </control>
            <control controlId="topLevelComboBox" controlType="HtmlComboBox" controlSearchProperties="Id=selReason">
                <childControls>
                    <control controlId="defaultOption" controlType="HtmlListItem" controlSearchProperties="InnerText=SELECT ONE">
                        <childControls/>
                    </control>
                    <control controlId="option1" controlType="HtmlListItem" controlSearchProperties="InnerText=TRAINING">
                        <childControls/>
                    </control>
                    <control controlId="option2" controlType="HtmlListItem" controlSearchProperties="InnerText=Other">
                        <childControls/>
                    </control>
                </childControls>
            </control>
            <control controlId="bottomFieldset" controlType="HtmlFieldSet" controlSearchProperties="class=bottomSection">
                <childControls>
                    <control controlId="okButton" controlType="HtmlButton" controlSearchProperties="Id=btnClose; acessKey=o; type=button" >
                      <childControls></childControls>
                    </control>
                </childControls>
            </control>
          </childControls>
        </control>
    </childControls>
</control>

I have all the mapping on how to map various control to different controltypes. But when I try to load the XHTML as XDocument (in order to extract attributes and elements), I get parsing error.

I thought of regular expression and basic string manipulation, but that might get too hard to manage, especially when trying to cover all edge cases.

I am not sure, what would be best way to approach this. Please help!!

Thanks in advance.

PatomaS
  • 1,603
  • 18
  • 25
K S
  • 301
  • 1
  • 4
  • 16

1 Answers1

1

Technically, XHTML already IS XML. So you can't really convert XHTML to XML, but what you can do is use XSLT to transform the XML from one stylesheet to another (you can think of it as a conversion of the definition or DTD, but it's not really quite the same).

Here you can see how to apply an XSLT: How to apply an XSLT Stylesheet in C#

And here how to write one: beginner XSLT tutorial

If you get a parsing error try to load the text into a browser. Some (Firefox) will tell you where the document breaks XML compliance. Or, post the error here.

Or see what the W3C XHTML Validator tells you.

Community
  • 1
  • 1
pid
  • 11,472
  • 6
  • 34
  • 63
  • Thank you for your response. I am not sure if XSLT is the best approach for this particular use case. If you look at the XML output, all the elements and their corresponding attributes are way different than the original XHTML. There need to be some kind of mapping, which I have already defined in my .NET project. – K S Feb 03 '14 at 21:09