0

There are a lot of related questions in SO. But the problem I'm trying to solve looks a bit different.

The requirements:

  1. A large xml input string with unknown data structure
  2. with a unknown number of children
  3. the elements have no attributes
  4. flatten to a csv file with the element names as column names

An example input looks like this:

<ARandomRoot>
  <ARandomLOne>
    <Id>12</Id>
    <OtherId>34</OtherId>    
  </ARandomLOne>
  <AnotherRandomLOne>
    <ARandomLTwo>
      <ARandomLTree>
        <NumberOfElements>2</NumberOfElements>
        <ARandomLFour>
          <RandomDataOne>R1</RandomDataOne>
          <RandomDataTwo>10.12</RandomDataTwo>          
        </ARandomLFour>
        <ARandomLFour>
          <RandomDataOne>R2</RandomDataOne>
          <RandomDataTwo>9.8</RandomDataTwo>          
        </ARandomLFour>
      </ARandomLTree>
    </ARandomLTwo>
  </AnotherRandomLOne>
</ARandomRoot>

And the output should be:

ARandomRoot-ARandomLOne-Id,ARandomRoot-ARandomLOne-OtherId,ARandomRoot-AnotherRandomLOne-ARandomLTwo-ARandomLTree-NumberOfElements,ARandomRoot-AnotherRandomLOne-ARandomLTwo-ARandomLTree-ARandomLFour-RandomDataOne,ARandomRoot-AnotherRandomLOne-ARandomLTwo-ARandomLTree-ARandomLFour-RandomDataTwo
12,34,2,R1,10.12
12,34,2,R2,9.8

Code I've got so far from another SO question with minor change

        var xml = XDocument.Parse(input);

        Func<string, string> csvFormat = t => String.Format("\"{0}\"", t.Replace("\"", "\"\""));

        Func<XDocument, IEnumerable<string>> getFields =
            xd =>
                xd
                    .Descendants()
                    .SelectMany(d => d.Elements())
                    .Select(e => e.Name.ToString());

        Func<XDocument, IEnumerable<string>> getHeaders =
            xd =>
                xd
                    .Descendants()
                    .SelectMany(d => d.Elements())
                    .Select(e => e.Name.ToString())
                    .Distinct();

        var headers =
            String.Join(",",
                getHeaders(xml)
                    .Select(f => csvFormat(f)));

        var query =
            from elements in xml.Descendants()
            select string.Join(",",
                getFields(xml)
                    .Select(f => elements.Elements(f).Any()
                        ? elements.Element(f).Value
                        : "")
                    .Select(x => csvFormat(x)));

        var csv =
            String.Join(Environment.NewLine,
                new[] { headers }.Concat(query));

While this produces the desired headers, but the data is not flattened.

Can someone please point me to the correct direction?

Allen Zhang
  • 2,432
  • 2
  • 20
  • 31
  • It looks like you need a kind of combination of https://stackoverflow.com/questions/8795479/using-linq-to-xml-method-to-get-path-to-all-leaves and https://stackoverflow.com/questions/451950/get-the-xpath-to-an-xelement – Renat May 28 '19 at 00:07
  • @Renat, thanks for the comment. A dictionary produces the output all in one line. But I need a pivot table to be the output, i.e. with duplicated values – Allen Zhang May 28 '19 at 01:19
  • Please [edit] your post to include any additional information you have to your question. Avoid adding this in the comments, as they are harder to read and can be deleted easier. The edit button for your post is just below the post's tags. – Jim G. Mar 30 '23 at 16:45

0 Answers0