0

I have a free text which might contains html-like definition of table, for example:

This is free text..... More free text... table start *row start*

cell 1 content# #cell 2 content

cell 3 content

row end*table end* More free text which might contain more tables definitions.

I'm looking for the best way to parse tables from such text in C#. I've read that regular expressions are not good for such text. Can any one help with this matter?

Thanks in advance.

mayap
  • 569
  • 5
  • 19
  • http://stackoverflow.com/questions/6063203/parsing-html-with-c-net This could help u – Preben Huybrechts Jul 11 '12 at 06:57
  • Thanks, but my text is not html. It is free text which might contains table definition that has html structure, but with different tags. So I cannot use HTMLAgilityPack. – mayap Jul 11 '12 at 07:17

2 Answers2

2

You can try it like this:

        string input = @"free text ... 
        <table><tr><td>
            <table><tr><td>test</td></tr></table>
        </td></tr></table> 
        more free text";
        string inputWithRoot = String.Format("<root>{0}</root>", input);

        XElement el = XElement.Parse(inputWithRoot);
        var tables = el.Descendants("table");

        foreach (XElement table in tables)
        {
            Console.WriteLine(table.ToString());
            Console.WriteLine();
        }
Ivan Golović
  • 8,732
  • 3
  • 25
  • 31
  • The tags that define the table can be something like: some value. The start and end tags are not necessarily the same. So I don't think it will work, though the solution is quite nice. – mayap Jul 11 '12 at 08:18
0

Once you have extracted the table in to a string

Please use Server.HtmlEncode to encode text that have html in it

HatSoft
  • 11,077
  • 3
  • 28
  • 43