0

Here my xml is setup as follows: This is coming via a webservice.

<doc>
<str name="data_id">XXXXXXX</str>
<str name="data">YYYY</str>
<str name="data2">zzzz</str>
...
..
<doc>

<doc>
<str name="data_id">X1X1X1X1X1X1X1</str>
<str name="data">Y1Y1Y1Y1</str>
<str name="data2">z1z1z1z1</str>
...
..
<doc>

I am converting the whole xml into a datatable.

How do I get the column header as column attributes. The resulting datatable have to be in the format as

data_id   data
xxxxx     yyyy

After loading xmldocument I have the following code which I have written to convert the whole thing into a csv string.

Stream k = r.GetResponse().GetResponseStream();
string csvOut = string.Empty;
var doc = XDocument.Load(k);
StringBuilder sb = new StringBuilder(100000);
DataTable table1 = new DataTable();

foreach (XElement node in doc.Descendants("doc"))
{
    foreach (XElement innerNode in node.Elements())
    {

        sb.AppendFormat("{0},", innerNode.Value);
    }
    sb.Remove(sb.Length - 1, 1);
    sb.AppendLine();
}
csvOut = sb.ToString();

I am trying to get the headers as data_id , data etc

i3arnon
  • 113,022
  • 33
  • 324
  • 344
fuller
  • 41
  • 1
  • 9

2 Answers2

3
var xdoc = XDocument.Load(path_to_xml);
DataTable dt = new DataTable();

// Get all column names from document
var columnNames = xdoc.Root.Descendants("str")
                      .Select(s => (string)s.Attribute("name"))
                      .Distinct();

// create column for each unique str
foreach(var columnName in columnNames)
   dt.Columns.Add(new DataColumn(columnName));

foreach(var doc in xdoc.Root.Elements("doc"))
{
   var row = dt.NewRow();

   // fill row values
   foreach(var str in doc.Elements("str"))
       row[(string)str.Attribute("name")] = (string)str;

   dt.Rows.Add(row); // add row
}

If all docs contain exactly same set of str elements, then you can replace column names getting with

var columName = xdoc.Root.Element("doc").Elements("str")
                    .Select(s => (string)s.Attribute("name"));

That will read only first <doc> node instead of traversing all elements in xml document.

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • Thanks. I got the idea. Can you advise how can I have a grasp on linq. I am currently working with a lot of these xml parsing and having trouble. Thanks – fuller Dec 19 '13 at 19:06
  • @fuller if you have lot of time, then I suggest you to dig into Jon's series [Reimplementing LINQ to Objects](http://edulinq.googlecode.com/hg/posts/index.html) also there is excellent tutorial on pluralsight [LINQ Fundamentals](http://pluralsight.com/training/Courses/TableOfContents/linq-fundamentals) – Sergey Berezovskiy Dec 19 '13 at 19:43
  • I have a follow up question relating to the question i asked. What if I have some elements in array in the xml instead of str in the beginning like homosapiens. Do I have to look for those elements in a separate loop after i look for only str elements. Or is there a standard implementation you would do. In short if the doc tag has more than one types of elements. – fuller Dec 20 '13 at 02:15
  • @fuller you need to handle that manually. You will need to read `arr` elements instead of `str` elements for getting column names and finding row values. But you don't need separate loop, because there is single `str` element inside `arr` element. Just get value as `row[(string)arr.Attribute("name")] = (string)arr.Element("str");` – Sergey Berezovskiy Dec 20 '13 at 06:33
0

Here is just a hint how you should do it with XDocument and LINQ.

    var query = from c in xmlFile.Descendants("doc") select c;
string data_id;
string data;
Datatable dt = new Datatable();
DataRow row;

foreach (XElement dom in query)
            {
            data_id = dom.Element("data_id").Value;
            data = dom.Element("data").Value;


  row = dt.NewRow();
  row[0] = data_id;
  row[1] = data;
  dt.Rows.Add(row);
            }
tilda
  • 676
  • 5
  • 8
  • Actually want to get the string "data_id" from the document. I already have the value for the data_id field in my csv. – fuller Dec 19 '13 at 17:44