3

I have been searching all over the web, i'm assuming some one has to have needed this before me and done it better, for an xml to csv converter. I have a very standard xml below:

<ArrayOfDealer xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
    <Dealer>
        <Cmf>76066699</Cmf>
        <DealerNumber/>
        <DealershipName>BROWARD MOTORSPORTS - WPB</DealershipName>
    </Dealer>
    <Dealer>
        <Cmf>76071027</Cmf>
        <DealerNumber/>
        <DealershipName>BROWARD MOTORSPORTS OF FT LAUDERDALE LLC</DealershipName>
    </Dealer>
    <Dealer>
        <Cmf>76014750</Cmf>
        <DealerNumber/>
        <DealershipName>Jet Ski of Miami</DealershipName>
    </Dealer>
    <Dealer>
        <Cmf>76066987</Cmf>
        <DealerNumber/>
        <DealershipName>BROWARD MOTORSPORTS - Davie</DealershipName>
    </Dealer>
</ArrayOfDealer>

And I would like to parse this to something like

cmf      dealernumber    dealershipname
76066699                 BROWARD MOTORSPORTS - WPB
76014750                 Jet Ski of Miami
76066987                 BROWARD MOTORSPORTS - Davie

XML is coming in from an api which I have stored as a string.

Thoughts?

Edit: To clarify, I know the structure will be like the above, with no further nesting of nodes, but the actual tag names can very.

Dmitriy Khaykin
  • 5,238
  • 1
  • 20
  • 32
Zach M.
  • 1,188
  • 7
  • 22
  • 46
  • I see in a comment you won't know the XML element names; can you at least be sure that the structure is the same, i.e., `` -- meaning you know that there is one repeating node under root that then contains identical amount of child nodes which are to be the csv values? – Dmitriy Khaykin Jan 24 '14 at 22:38
  • Yes that is what I meant, yes the structure will be the same. – Zach M. Jan 24 '14 at 22:39
  • And the result, has to be proper CSV (comma delimited) or a tab delimited text file? – Dmitriy Khaykin Jan 24 '14 at 22:54

4 Answers4

5

Would something like this get you what you need:

Func<string, string> csvFormat =
    t => String.Format("\"{0}\"", t.Replace("\"", "\"\""));

var xml = XDocument.Parse(/* xml text here */);

Func<XDocument, IEnumerable<string>> getFields =
    xd =>
        xd
            .Descendants("Dealer")
            .SelectMany(d => d.Elements())
            .Select(e => e.Name.ToString())
            .Distinct();

var headers =
    String.Join(",",
        getFields(xml)
            .Select(f => csvFormat(f)));

var query =
    from dealer in xml.Descendants("Dealer")
    select string.Join(",",
        getFields(xml)
            .Select(f => dealer.Elements(f).Any()
                ? dealer.Element(f).Value
                : "")
            .Select(x => csvFormat(x)));

var csv =
    String.Join(Environment.NewLine,
        new [] { headers }.Concat(query));

This still assumes that the <ArrayOfDealer><Dealer> structure says the same, but the fields below might change.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • This relies on the names of the elements, which OP stated in a comment he won't know and will not be constant. – Dmitriy Khaykin Jan 24 '14 at 23:13
  • @DavidKhaykin - I've updated my solution accordingly. It would have been nice for the OP to have edited his question to clarify this point. The final question and answer should match without the need to read comments. – Enigmativity Jan 25 '14 at 01:21
  • I agree about that. As more experienced members we can always edit their comment into the question and leave a comment stating the etiquette for the future. I was going to do this, but then I started drinking instead. :) just kidding... – Dmitriy Khaykin Jan 25 '14 at 01:29
  • 1
    @DavidKhaykin - Sometimes drinking is the easiest option. ;-) – Enigmativity Jan 25 '14 at 01:37
3

This is rough but it will create a CSV file or a tab delimited text file as long as structure stays with

Root -> Main Element (for each row) -> Child Elements (any number)

I included 2 different test sets of XML so you can see the result generates correctly.

Working sample:

namespace XmlToCsv
{
    class Program
    {
        const int TabSpaces = 8;

        static void GenerateCsvFromXml(string xmlString, string resultFileName, bool isTabDelimited)
        {
            XDocument xDoc = XDocument.Parse(xmlString);

            var tabsNeededList = new List<int>(); // only used for TabDelimited file

            string delimiter = isTabDelimited
                ? "\t"
                : ",";

            // Get title row 
            var titlesList = xDoc.Root
                .Elements()
                .First()
                .Elements()
                .Select(s => s.Name.LocalName)
                .ToList();

            // Get the values
            var masterValuesList = xDoc.Root
                .Elements()
                .Select(e => e
                    .Elements()
                    .Select(c => c.Value)
                    .ToList())
                .ToList();

            // Add titles as first row in master values list
            masterValuesList.Insert(0, titlesList);

            // For tab delimited, we need to figure out the number of tabs
            // needed to keep the file uniform, for each column
            if (isTabDelimited)
            {
                for (var i = 0; i < titlesList.Count; i++)
                {
                    int maxLength =
                        masterValuesList
                            .Select(vl => vl[i].Length)
                            .Max();

                    // assume tab is 4 characters
                    int rem;
                    int tabsNeeded = Math.DivRem(maxLength, TabSpaces, out rem);
                    tabsNeededList.Add(tabsNeeded);
                }
            }

            // Write the file
            using (var fs = new FileStream(resultFileName, FileMode.Create))
            using (var sw = new StreamWriter(fs))
            {
                foreach (var values in masterValuesList)
                {
                    string line = string.Empty;

                    foreach (var value in values)
                    {
                        line += value;
                        if (titlesList.IndexOf(value) < titlesList.Count - 1)
                        {
                            if (isTabDelimited)
                            {
                                int rem;
                                int tabsUsed = Math.DivRem(value.Length, TabSpaces, out rem);
                                int tabsLeft = tabsNeededList[values.IndexOf(value)] - tabsUsed + 1; // one tab is always needed!

                                for (var i = 0; i < tabsLeft; i++)
                                {
                                    line += delimiter;
                                }
                            }
                            else // comma delimited
                            {
                                line += delimiter;
                            }
                        }
                    }

                    sw.WriteLine(line);
                }
            }
        }

        static void Main(string[] args)
        {
            String xmlString = @"<ArrayOfDealer xmlns:i=""http://www.w3.org/2001/XMLSchema-instance"">
                <Dealer>
                    <Cmf>76066699</Cmf>
                    <DealerNumber/>
                    <DealershipName>BROWARD MOTORSPORTS - WPB</DealershipName>
                </Dealer>
                <Dealer>
                    <Cmf>76071027</Cmf>
                    <DealerNumber/>
                    <DealershipName>BROWARD MOTORSPORTS OF FT LAUDERDALE LLC</DealershipName>
                </Dealer>
                <Dealer>
                    <Cmf>76014750</Cmf>
                    <DealerNumber/>
                    <DealershipName>Jet Ski of Miami</DealershipName>
                </Dealer>
                <Dealer>
                    <Cmf>76066987</Cmf>
                    <DealerNumber/>
                    <DealershipName>BROWARD MOTORSPORTS - Davie</DealershipName>
                </Dealer>
            </ArrayOfDealer>";

            String xmlString2 = @"<ArrayOfUnicorn xmlns:i=""http://www.w3.org/2001/XMLSchema-instance"">
                <Unicorn>
                    <UnicornColor>Red</UnicornColor>
                    <Cmf>76066699</Cmf>
                    <UnicornNumber/>
                    <UnicornshipName>BROWARD MOTORSPORTS - WPB</UnicornshipName>
                </Unicorn>
                <Unicorn>
                    <UnicornColor>Red</UnicornColor>
                    <Cmf>76071027</Cmf>
                    <UnicornNumber/>
                    <UnicornshipName>BROWARD MOTORSPORTS OF FT LAUDERDALE LLC</UnicornshipName>
                </Unicorn>
                <Unicorn>
                    <UnicornColor>Red</UnicornColor>
                    <Cmf>76014750</Cmf>
                    <UnicornNumber/>
                    <UnicornshipName>Jet Ski of Miami</UnicornshipName>
                </Unicorn>
                <Unicorn>
                    <UnicornColor>Red</UnicornColor>
                    <Cmf>76066987</Cmf>
                    <UnicornNumber/>
                    <UnicornshipName>BROWARD MOTORSPORTS - Davie</UnicornshipName>
                </Unicorn>
            </ArrayOfUnicorn>";

            // Comma delimited
            GenerateCsvFromXml(xmlString, "Dealer.csv", false);
            GenerateCsvFromXml(xmlString2, "Unicorn.csv", false);

            // Tab delimited
            GenerateCsvFromXml(xmlString, "Dealer.txt", true);
            GenerateCsvFromXml(xmlString2, "Unicorn.txt", true);

        }
    }
}

Results:

Dealer.csv:

Cmf,DealerNumber,DealershipName
76066699,,BROWARD MOTORSPORTS - WPB,
76071027,,BROWARD MOTORSPORTS OF FT LAUDERDALE LLC,
76014750,,Jet Ski of Miami,
76066987,,BROWARD MOTORSPORTS - Davie,

Unicorn.csv:

UnicornColor,Cmf,UnicornNumber,UnicornshipName
Red,76066699,,BROWARD MOTORSPORTS - WPB,
Red,76071027,,BROWARD MOTORSPORTS OF FT LAUDERDALE LLC,
Red,76014750,,Jet Ski of Miami,
Red,76066987,,BROWARD MOTORSPORTS - Davie,

Unicorn.txt (tab delimited):

UnicornColor    Cmf         UnicornNumber   UnicornshipName
Red             76066699                    BROWARD MOTORSPORTS - WPB           
Red             76071027                    BROWARD MOTORSPORTS OF FT LAUDERDALE LLC    
Red             76014750                    Jet Ski of Miami                
Red             76066987                    BROWARD MOTORSPORTS - Davie         
Dmitriy Khaykin
  • 5,238
  • 1
  • 20
  • 32
0

Deserialize your xml to an object using XmlSerializer. Then use the approach in this question to serialize your object to a csv file.

Community
  • 1
  • 1
Ryan Bennett
  • 3,404
  • 19
  • 32
  • Does having no knowledge of the naming convention going to be an issue with this approach? What I am writing has to be "dumb" to the names of the tags. – Zach M. Jan 24 '14 at 22:25
  • So you will never know the structure of your xml? You wouldn't be able to generate a schema file? – Ryan Bennett Jan 24 '14 at 22:29
  • I know the structure will be like the above, with no further nesting of nodes, but the actual tag names can very. – Zach M. Jan 24 '14 at 22:38
  • @ZachM. I've edited your comment from this answer into your question. It's always a good idea to include this kind of pertinent information in your question to receive the most applicable answers possible. – Dmitriy Khaykin Jan 25 '14 at 01:39
0

The code below transforms a simple xml content (as provided by the question) to a sequence of string (header + rows) in a csv format in a generic way, it is just necessary to implement a more reliable function to escape values (identity transformation in the example below).

string csvSeparator = ",";
Func<string, string> escapeValue = val => val;

string xml = "xml content";
XDocument doc = XDocument.Parse(xml);

var headers = doc.Root
                .Elements()
                .First()
                .Elements()
                .Select(el => el.Name.LocalName);

var headerRow = string.Join(csvSeparator, headers);

var rows = from el in doc.Root.Elements()
            let values = from prop in el.Elements()
                        select escapeValue(prop.Value)
            let row = string.Join(csvSeparator, values)
            select row;


IEnumerable<string> csvLines = new[] { headerRow }.Concat(rows);

If you need the entire content of the csv file, you can do:

string csvContent = string.Join(Environment.NewLine, csvLines);
polkduran
  • 2,533
  • 24
  • 34